[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-1: Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: llap, UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-1.1.patch, HIVE-1.2.patch, > HIVE-1.3.patch, HIVE-1.4.patch, HIVE-1.5.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-1: Affects Version/s: 3.1.2 > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: llap, UDF >Affects Versions: 3.1.2 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-1.1.patch, HIVE-1.2.patch, > HIVE-1.3.patch, HIVE-1.4.patch, HIVE-1.5.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-1: Issue Type: Improvement (was: Bug) > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Improvement > Components: llap, UDF >Affects Versions: 3.1.2 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-1.1.patch, HIVE-1.2.patch, > HIVE-1.3.patch, HIVE-1.4.patch, HIVE-1.5.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-1: - Attachment: HIVE-1.5.patch > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: llap, UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-1.1.patch, HIVE-1.2.patch, > HIVE-1.3.patch, HIVE-1.4.patch, HIVE-1.5.patch > > Time Spent: 10m > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-1: - Attachment: HIVE-1.4.patch > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: llap, UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-1.1.patch, HIVE-1.2.patch, > HIVE-1.3.patch, HIVE-1.4.patch > > Time Spent: 10m > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-1: - Attachment: HIVE-1.3.patch > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: llap, UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-1.1.patch, HIVE-1.2.patch, > HIVE-1.3.patch > > Time Spent: 10m > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-1: - Attachment: HIVE-1.2.patch > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: llap, UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-1.1.patch, HIVE-1.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-1: - Attachment: HIVE-1.1.patch Status: Patch Available (was: Open) > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: llap, UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-1.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
[ https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-1: -- Labels: pull-request-available (was: ) > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > - > > Key: HIVE-1 > URL: https://issues.apache.org/jira/browse/HIVE-1 > Project: Hive > Issue Type: Bug > Components: llap, UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)