[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key

2019-10-03 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2241:

Description: 
Currently, while looking up a key, the Ozone Manager gets the pipeline 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making a lot of RPC calls for this. This can be optimized 
in a couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, a simpler change would be to have a map (method local) of 
ContainerID -> Pipeline that we get from SCM so that we don't need to make 
repeated calls to SCM for the same containerID for a key. _Here, Number of 
calls = Number of unique containerIDs_

  was:
Currently, while looking up a key, the Ozone Manager gets the pipeline 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making a lot of RPC calls for this. This can be optimized 
in a couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline 
that we get from SCM so that we don't need to make repeated calls to SCM for 
the same containerID for a key. _Here, Number of calls = Number of unique 
containerIDs_


> Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the 
> pipeline for a key
> ---
>
> Key: HDDS-2241
> URL: https://issues.apache.org/jira/browse/HDDS-2241
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>
> Currently, while looking up a key, the Ozone Manager gets the pipeline 
> information from SCM through an RPC for every block in the key. For large 
> files > 1GB, we may end up making a lot of RPC calls for this. This can be 
> optimized in a couple of ways
> * We can implement a batch getContainerWithPipeline API in SCM using which we 
> can get the pipeline info locations for all the blocks for a file. To keep 
> the number of containers passed in to SCM in a single call, we can have a 
> fixed container batch size on the OM side. _Here, Number of calls = 1 (or k 
> depending on batch size)_
> * Instead, a simpler change would be to have a map (method local) of 
> ContainerID -> Pipeline that we get from SCM so that we don't need to make 
> repeated calls to SCM for the same containerID for a key. _Here, Number of 
> calls = Number of unique containerIDs_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key

2019-10-03 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2241:

Description: 
Currently, while looking up a key, the Ozone Manager gets the pipeline 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making a lot of RPC calls for this. This can be optimized 
in a couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline 
that we get from SCM so that we don't need to make repeated calls to SCM for 
the same containerID for a key. _Here, Number of calls = Number of unique 
containerIDs_

  was:
Currently, while looking up a key, the Ozone Manager gets the pipeline 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline 
that we get from SCM so that we don't need to make repeated calls to SCM for 
the same containerID for a key. _Here, Number of calls = Number of unique 
containerIDs_


> Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the 
> pipeline for a key
> ---
>
> Key: HDDS-2241
> URL: https://issues.apache.org/jira/browse/HDDS-2241
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>
> Currently, while looking up a key, the Ozone Manager gets the pipeline 
> information from SCM through an RPC for every block in the key. For large 
> files > 1GB, we may end up making a lot of RPC calls for this. This can be 
> optimized in a couple of ways
> * We can implement a batch getContainerWithPipeline API in SCM using which we 
> can get the pipeline info locations for all the blocks for a file. To keep 
> the number of containers passed in to SCM in a single call, we can have a 
> fixed container batch size on the OM side. _Here, Number of calls = 1 (or k 
> depending on batch size)_
> * Instead, we can have a simple map (method local) for ContainerID -> 
> Pipeline that we get from SCM so that we don't need to make repeated calls to 
> SCM for the same containerID for a key. _Here, Number of calls = Number of 
> unique containerIDs_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key

2019-10-03 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2241:

Description: 
Currently, while looking up a key, the Ozone Manager gets the pipeline 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline 
that we get from SCM so that we don't need to make repeated calls to SCM for 
the same containerID for a key. _Here, Number of calls = Number of unique 
containerIDs_

  was:
Currently, while looking up a key, the Ozone Manager gets the pipeline location 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline 
that we get from SCM so that we don't need to make repeated calls to SCM for 
the same containerID for a key. _Here, Number of calls = Number of unique 
containerIDs_


> Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the 
> pipeline for a key
> ---
>
> Key: HDDS-2241
> URL: https://issues.apache.org/jira/browse/HDDS-2241
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>
> Currently, while looking up a key, the Ozone Manager gets the pipeline 
> information from SCM through an RPC for every block in the key. For large 
> files > 1GB, we may end up making ~4 RPC calls for this. This can be 
> optimized in a couple of ways
> * We can implement a batch getContainerWithPipeline API in SCM using which we 
> can get the pipeline info locations for all the blocks for a file. To keep 
> the number of containers passed in to SCM in a single call, we can have a 
> fixed container batch size on the OM side. _Here, Number of calls = 1 (or k 
> depending on batch size)_
> * Instead, we can have a simple map (method local) for ContainerID -> 
> Pipeline that we get from SCM so that we don't need to make repeated calls to 
> SCM for the same containerID for a key. _Here, Number of calls = Number of 
> unique containerIDs_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key

2019-10-03 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2241:

Description: 
Currently, while looking up a key, the Ozone Manager gets the pipeline location 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline 
that we get from SCM so that we don't need to make repeated calls to SCM for 
the same containerID for a key. _Here, Number of calls = Number of unique 
containerIDs_

  was:
Currently, while looking up a key, the Ozone Manager gets the pipeline location 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline 
that we get from SCM so that we don't need to make calls to SCM again for the 
same pipeline for that key. _Here, Number of calls = Number of unique 
containerIDs_


> Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the 
> pipeline for a key
> ---
>
> Key: HDDS-2241
> URL: https://issues.apache.org/jira/browse/HDDS-2241
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>
> Currently, while looking up a key, the Ozone Manager gets the pipeline 
> location information from SCM through an RPC for every block in the key. For 
> large files > 1GB, we may end up making ~4 RPC calls for this. This can be 
> optimized in a couple of ways
> * We can implement a batch getContainerWithPipeline API in SCM using which we 
> can get the pipeline info locations for all the blocks for a file. To keep 
> the number of containers passed in to SCM in a single call, we can have a 
> fixed container batch size on the OM side. _Here, Number of calls = 1 (or k 
> depending on batch size)_
> * Instead, we can have a simple map (method local) for ContainerID -> 
> Pipeline that we get from SCM so that we don't need to make repeated calls to 
> SCM for the same containerID for a key. _Here, Number of calls = Number of 
> unique containerIDs_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key

2019-10-03 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2241:

Description: 
Currently, while looking up a key, the Ozone Manager gets the pipeline location 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. _Here, Number of calls = 1 (or k depending 
on batch size)_
* Instead, we can have a simple map (method local) for ContainerID -> Pipeline 
that we get from SCM so that we don't need to make calls to SCM again for the 
same pipeline for that key. _Here, Number of calls = Number of unique 
containerIDs_

  was:
Currently, while looking up a key, the Ozone Manager gets the pipeline location 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. 
* Instead, we can have a simple map (inside the method) for ContainerID -> 
Pipeline that we got from SCM so that we don't need to make calls to SCM again 
for the same pipeline. Here number of calls = number of unique containerIDs.


> Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the 
> pipeline for a key
> ---
>
> Key: HDDS-2241
> URL: https://issues.apache.org/jira/browse/HDDS-2241
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>
> Currently, while looking up a key, the Ozone Manager gets the pipeline 
> location information from SCM through an RPC for every block in the key. For 
> large files > 1GB, we may end up making ~4 RPC calls for this. This can be 
> optimized in a couple of ways
> * We can implement a batch getContainerWithPipeline API in SCM using which we 
> can get the pipeline info locations for all the blocks for a file. To keep 
> the number of containers passed in to SCM in a single call, we can have a 
> fixed container batch size on the OM side. _Here, Number of calls = 1 (or k 
> depending on batch size)_
> * Instead, we can have a simple map (method local) for ContainerID -> 
> Pipeline that we get from SCM so that we don't need to make calls to SCM 
> again for the same pipeline for that key. _Here, Number of calls = Number of 
> unique containerIDs_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2241) Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the pipeline for a key

2019-10-03 Thread Aravindan Vijayan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aravindan Vijayan updated HDDS-2241:

Description: 
Currently, while looking up a key, the Ozone Manager gets the pipeline location 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file. To keep the 
number of containers passed in to SCM in a single call, we can have a fixed 
container batch size on the OM side. 
* Instead, we can have a simple map (inside the method) for ContainerID -> 
Pipeline that we got from SCM so that we don't need to make calls to SCM again 
for the same pipeline. Here number of calls = number of unique containerIDs.

  was:
Currently, while looking up a key, the Ozone Manager gets the pipeline location 
information from SCM through an RPC for every block in the key. For large files 
> 1GB, we may end up making ~4 RPC calls for this. This can be optimized in a 
couple of ways

* We can implement a batch getContainerWithPipeline API in SCM using which we 
can get the pipeline info locations for all the blocks for a file.
* Instead, we can have a method local cache for ContainerID -> Pipeline that we 
got from SCM so that we don't need to make calls to SCM again for the same 
pipeline.


> Optimize the refresh pipeline logic used by KeyManagerImpl to obtain the 
> pipeline for a key
> ---
>
> Key: HDDS-2241
> URL: https://issues.apache.org/jira/browse/HDDS-2241
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>
> Currently, while looking up a key, the Ozone Manager gets the pipeline 
> location information from SCM through an RPC for every block in the key. For 
> large files > 1GB, we may end up making ~4 RPC calls for this. This can be 
> optimized in a couple of ways
> * We can implement a batch getContainerWithPipeline API in SCM using which we 
> can get the pipeline info locations for all the blocks for a file. To keep 
> the number of containers passed in to SCM in a single call, we can have a 
> fixed container batch size on the OM side. 
> * Instead, we can have a simple map (inside the method) for ContainerID -> 
> Pipeline that we got from SCM so that we don't need to make calls to SCM 
> again for the same pipeline. Here number of calls = number of unique 
> containerIDs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org