[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key
[ https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated HDDS-2241: Description: Currently, while looking up a key, the Ozone Manager gets the pipeline information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making a lot of RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, a simpler change would be to have a map (method local) of ContainerID -> Pipeline that we get from SCM so that we don't need to make repeated calls to SCM for the same containerID for a key. _Here, Number of calls = Number of unique containerIDs_ was: Currently, while looking up a key, the Ozone Manager gets the pipeline information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making a lot of RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, we can have a simple map (method local) for ContainerID -> Pipeline that we get from SCM so that we don't need to make repeated calls to SCM for the same containerID for a key. _Here, Number of calls = Number of unique containerIDs_ > Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the > pipeline for a key > --- > > Key: HDDS-2241 > URL: https://issues.apache.org/jira/browse/HDDS-2241 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > > Currently, while looking up a key, the Ozone Manager gets the pipeline > information from SCM through an RPC for every block in the key. For large > files > 1GB, we may end up making a lot of RPC calls for this. This can be > optimized in a couple of ways > * We can implement a batch getContainerWithPipeline API in SCM using which we > can get the pipeline info locations for all the blocks for a file. To keep > the number of containers passed in to SCM in a single call, we can have a > fixed container batch size on the OM side. _Here, Number of calls = 1 (or k > depending on batch size)_ > * Instead, a simpler change would be to have a map (method local) of > ContainerID -> Pipeline that we get from SCM so that we don't need to make > repeated calls to SCM for the same containerID for a key. _Here, Number of > calls = Number of unique containerIDs_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key
[ https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated HDDS-2241: Description: Currently, while looking up a key, the Ozone Manager gets the pipeline information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making a lot of RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, we can have a simple map (method local) for ContainerID -> Pipeline that we get from SCM so that we don't need to make repeated calls to SCM for the same containerID for a key. _Here, Number of calls = Number of unique containerIDs_ was: Currently, while looking up a key, the Ozone Manager gets the pipeline information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, we can have a simple map (method local) for ContainerID -> Pipeline that we get from SCM so that we don't need to make repeated calls to SCM for the same containerID for a key. _Here, Number of calls = Number of unique containerIDs_ > Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the > pipeline for a key > --- > > Key: HDDS-2241 > URL: https://issues.apache.org/jira/browse/HDDS-2241 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > > Currently, while looking up a key, the Ozone Manager gets the pipeline > information from SCM through an RPC for every block in the key. For large > files > 1GB, we may end up making a lot of RPC calls for this. This can be > optimized in a couple of ways > * We can implement a batch getContainerWithPipeline API in SCM using which we > can get the pipeline info locations for all the blocks for a file. To keep > the number of containers passed in to SCM in a single call, we can have a > fixed container batch size on the OM side. _Here, Number of calls = 1 (or k > depending on batch size)_ > * Instead, we can have a simple map (method local) for ContainerID -> > Pipeline that we get from SCM so that we don't need to make repeated calls to > SCM for the same containerID for a key. _Here, Number of calls = Number of > unique containerIDs_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key
[ https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated HDDS-2241: Description: Currently, while looking up a key, the Ozone Manager gets the pipeline information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, we can have a simple map (method local) for ContainerID -> Pipeline that we get from SCM so that we don't need to make repeated calls to SCM for the same containerID for a key. _Here, Number of calls = Number of unique containerIDs_ was: Currently, while looking up a key, the Ozone Manager gets the pipeline location information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, we can have a simple map (method local) for ContainerID -> Pipeline that we get from SCM so that we don't need to make repeated calls to SCM for the same containerID for a key. _Here, Number of calls = Number of unique containerIDs_ > Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the > pipeline for a key > --- > > Key: HDDS-2241 > URL: https://issues.apache.org/jira/browse/HDDS-2241 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > > Currently, while looking up a key, the Ozone Manager gets the pipeline > information from SCM through an RPC for every block in the key. For large > files > 1GB, we may end up making ~4 RPC calls for this. This can be > optimized in a couple of ways > * We can implement a batch getContainerWithPipeline API in SCM using which we > can get the pipeline info locations for all the blocks for a file. To keep > the number of containers passed in to SCM in a single call, we can have a > fixed container batch size on the OM side. _Here, Number of calls = 1 (or k > depending on batch size)_ > * Instead, we can have a simple map (method local) for ContainerID -> > Pipeline that we get from SCM so that we don't need to make repeated calls to > SCM for the same containerID for a key. _Here, Number of calls = Number of > unique containerIDs_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key
[ https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated HDDS-2241: Description: Currently, while looking up a key, the Ozone Manager gets the pipeline location information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, we can have a simple map (method local) for ContainerID -> Pipeline that we get from SCM so that we don't need to make repeated calls to SCM for the same containerID for a key. _Here, Number of calls = Number of unique containerIDs_ was: Currently, while looking up a key, the Ozone Manager gets the pipeline location information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, we can have a simple map (method local) for ContainerID -> Pipeline that we get from SCM so that we don't need to make calls to SCM again for the same pipeline for that key. _Here, Number of calls = Number of unique containerIDs_ > Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the > pipeline for a key > --- > > Key: HDDS-2241 > URL: https://issues.apache.org/jira/browse/HDDS-2241 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > > Currently, while looking up a key, the Ozone Manager gets the pipeline > location information from SCM through an RPC for every block in the key. For > large files > 1GB, we may end up making ~4 RPC calls for this. This can be > optimized in a couple of ways > * We can implement a batch getContainerWithPipeline API in SCM using which we > can get the pipeline info locations for all the blocks for a file. To keep > the number of containers passed in to SCM in a single call, we can have a > fixed container batch size on the OM side. _Here, Number of calls = 1 (or k > depending on batch size)_ > * Instead, we can have a simple map (method local) for ContainerID -> > Pipeline that we get from SCM so that we don't need to make repeated calls to > SCM for the same containerID for a key. _Here, Number of calls = Number of > unique containerIDs_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key
[ https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated HDDS-2241: Description: Currently, while looking up a key, the Ozone Manager gets the pipeline location information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. _Here, Number of calls = 1 (or k depending on batch size)_ * Instead, we can have a simple map (method local) for ContainerID -> Pipeline that we get from SCM so that we don't need to make calls to SCM again for the same pipeline for that key. _Here, Number of calls = Number of unique containerIDs_ was: Currently, while looking up a key, the Ozone Manager gets the pipeline location information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. * Instead, we can have a simple map (inside the method) for ContainerID -> Pipeline that we got from SCM so that we don't need to make calls to SCM again for the same pipeline. Here number of calls = number of unique containerIDs. > Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the > pipeline for a key > --- > > Key: HDDS-2241 > URL: https://issues.apache.org/jira/browse/HDDS-2241 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > > Currently, while looking up a key, the Ozone Manager gets the pipeline > location information from SCM through an RPC for every block in the key. For > large files > 1GB, we may end up making ~4 RPC calls for this. This can be > optimized in a couple of ways > * We can implement a batch getContainerWithPipeline API in SCM using which we > can get the pipeline info locations for all the blocks for a file. To keep > the number of containers passed in to SCM in a single call, we can have a > fixed container batch size on the OM side. _Here, Number of calls = 1 (or k > depending on batch size)_ > * Instead, we can have a simple map (method local) for ContainerID -> > Pipeline that we get from SCM so that we don't need to make calls to SCM > again for the same pipeline for that key. _Here, Number of calls = Number of > unique containerIDs_ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key
[ https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aravindan Vijayan updated HDDS-2241: Description: Currently, while looking up a key, the Ozone Manager gets the pipeline location information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. To keep the number of containers passed in to SCM in a single call, we can have a fixed container batch size on the OM side. * Instead, we can have a simple map (inside the method) for ContainerID -> Pipeline that we got from SCM so that we don't need to make calls to SCM again for the same pipeline. Here number of calls = number of unique containerIDs. was: Currently, while looking up a key, the Ozone Manager gets the pipeline location information from SCM through an RPC for every block in the key. For large files > 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a couple of ways * We can implement a batch getContainerWithPipeline API in SCM using which we can get the pipeline info locations for all the blocks for a file. * Instead, we can have a method local cache for ContainerID -> Pipeline that we got from SCM so that we don't need to make calls to SCM again for the same pipeline. > Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the > pipeline for a key > --- > > Key: HDDS-2241 > URL: https://issues.apache.org/jira/browse/HDDS-2241 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > > Currently, while looking up a key, the Ozone Manager gets the pipeline > location information from SCM through an RPC for every block in the key. For > large files > 1GB, we may end up making ~4 RPC calls for this. This can be > optimized in a couple of ways > * We can implement a batch getContainerWithPipeline API in SCM using which we > can get the pipeline info locations for all the blocks for a file. To keep > the number of containers passed in to SCM in a single call, we can have a > fixed container batch size on the OM side. > * Instead, we can have a simple map (inside the method) for ContainerID -> > Pipeline that we got from SCM so that we don't need to make calls to SCM > again for the same pipeline. Here number of calls = number of unique > containerIDs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org