Re: [RFC] Support for creation of Large Tensors in MXNet

2019-05-18 Thread Sheng Zha
Thanks for clarifying. This seems like a duplicate of [1] (though there wasn't 
any feedback there). I think everyone already agrees on the goal. 

> Currently, we assume the max size of each dimension.

I agree with Tao that int64_t would be necessary given that it's common to 
flatten and reshape ndarrays.

To help avoid repeating discussion and to make this discussion more productive, 
here are some of the relevant context that I'm aware of:
- The first part of the proposed change was merged in #11742 which caused 
#14496, i.e. performance degredation in transpose and imdecode. The full scope 
is still unclear.
- A compilation flag was added in #14570 so that people can explicitly opt in 
for the support without impacting others using the default setting.

Given the context, since the goal is to support large tensor by default without 
performance impact, I hope more investigation could accompany this proposal 
that covers:
- The problem: list the parts (e.g. operators) whose performance is impacted by 
changing the index type, and the amount of slow-down.
- The solution for addressing the slow-down.

Thanks.

-sz

[1] 
https://lists.apache.org/thread.html/52b784cf85f89a22355e195fc88b01992fb1993a6f08499a46fa1ff8@%3Cdev.mxnet.apache.org%3E

On 2019/05/19 02:43:39, "Srivastava, Rohit Kumar" 
 wrote: 
> Hi Tao,
> Existing MXNet implementation doesn't support large tensors. MXNet 
> NDArray creation for tensors of sizes larger than 2^32 is only supported by 
> enabling a build flag for now. The purpose of this thread is to have the 
> community provide feedback on the design cwiki for *Large Tensor Support* in 
> MXNet. The intension is to make large tensor support as default feature in 
> MXNet (in future) w/o any performance impact so consumers do not have to 
> build it from source. 
> 
> -Rohit
> 
> On 5/18/19, 5:59 PM, "Lv, Tao A"  wrote:
> 
> Hi Rohit,
> 
> The existing MKL-DNN and its integration in MXNet should already support 
> *large tensor* which means the total number of elements (Prod(shape)) can 
> exceed INT_MAX. Feel free to me know if you find any issue when using MKL-DNN 
> operators with large tensors.
> 
> For large dimension size (shape[x]), MKL-DNN is going to support in its 
> 1.0 release and will be released at the middle of year. But I'm not sure if 
> MXNet has plan to support that.
> 
> Thanks,
> -tao
> 
> -Original Message-
> From: Srivastava, Rohit Kumar [mailto:srivastava@buckeyemail.osu.edu] 
> Sent: Sunday, May 19, 2019 7:23 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
> 
> Hi Tao,
> There are already couple of operators implemented in MXNet that are 
> currently supporting Tensors with size over ~4.5 billion. In the meantime 
> core MXNet can move ahead with providing initial support for such large 
> tensors so MXNet customers can start using it.
> 
> Good to hear MKLDNN will provide support for such cases. Do you have a 
> timeline as to when this feature will be released ?
> 
> -Rohit
> 
> On 4/29/19, 7:18 PM, "Lv, Tao A"  wrote:
> 
> Thank you Lin! I would expect the current MKL-DNN implementation 
> already supports the scenario you mentioned here. Can be verified by this 
> issue: https://github.com/apache/incubator-mxnet/issues/13451
> 
> But as I said before, since we support flatten or reshape operators, 
> so it's possible for users to convert a tensor with large element size to a 
> tensor with large dimension size. It possibly will cause issue there.
> 
> To cover more cases, MKL-DNN is going to support INT64 dimension size 
> in its coming 1.0 major release.
> 
> -tao
> 
> -Original Message-
> From: Lin Yuan [mailto:apefor...@gmail.com] 
> Sent: Tuesday, April 30, 2019 12:56 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [RFC] Support for creation of Large Tensors in MXNet
> 
> Tao,
> 
> - what's the max size of dimensionality? Which data type is used to 
> define dimensionality (ndims)?
> We assume the max size of dimensionality is relatively small. Hence 
> `int` data type is used to define ndim
> 
> - what's the max size of each dimension? Which data type is used to 
> define dimension size (shape[x])?
> Currently, we assume the max size of each dimension is not going to 
> exceed
> 2^31 in real applications. Hence the data type is `int32_t`
> 
> - what's the max size of total elements? Which data type is used to 
> define element size (Prod(shape))?
> We assume the total number of elements in a tensor can be larger than 
> 2^32 in some applications such as deep graph library. We use the data type 
> `int64_t` to represent the total element size. Currently due to 

Re: [RFC] Support for creation of Large Tensors in MXNet

2019-05-18 Thread Srivastava, Rohit Kumar
Hi Tao,
Existing MXNet implementation doesn't support large tensors. MXNet NDArray 
creation for tensors of sizes larger than 2^32 is only supported by enabling a 
build flag for now. The purpose of this thread is to have the community provide 
feedback on the design cwiki for *Large Tensor Support* in MXNet. The intension 
is to make large tensor support as default feature in MXNet (in future) w/o any 
performance impact so consumers do not have to build it from source. 

-Rohit

On 5/18/19, 5:59 PM, "Lv, Tao A"  wrote:

Hi Rohit,

The existing MKL-DNN and its integration in MXNet should already support 
*large tensor* which means the total number of elements (Prod(shape)) can 
exceed INT_MAX. Feel free to me know if you find any issue when using MKL-DNN 
operators with large tensors.

For large dimension size (shape[x]), MKL-DNN is going to support in its 1.0 
release and will be released at the middle of year. But I'm not sure if MXNet 
has plan to support that.

Thanks,
-tao

-Original Message-
From: Srivastava, Rohit Kumar [mailto:srivastava@buckeyemail.osu.edu] 
Sent: Sunday, May 19, 2019 7:23 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Hi Tao,
There are already couple of operators implemented in MXNet that are 
currently supporting Tensors with size over ~4.5 billion. In the meantime core 
MXNet can move ahead with providing initial support for such large tensors so 
MXNet customers can start using it.

Good to hear MKLDNN will provide support for such cases. Do you have a 
timeline as to when this feature will be released ?

-Rohit

On 4/29/19, 7:18 PM, "Lv, Tao A"  wrote:

Thank you Lin! I would expect the current MKL-DNN implementation 
already supports the scenario you mentioned here. Can be verified by this 
issue: https://github.com/apache/incubator-mxnet/issues/13451

But as I said before, since we support flatten or reshape operators, so 
it's possible for users to convert a tensor with large element size to a tensor 
with large dimension size. It possibly will cause issue there.

To cover more cases, MKL-DNN is going to support INT64 dimension size 
in its coming 1.0 major release.

-tao

-Original Message-
From: Lin Yuan [mailto:apefor...@gmail.com] 
Sent: Tuesday, April 30, 2019 12:56 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Tao,

- what's the max size of dimensionality? Which data type is used to 
define dimensionality (ndims)?
We assume the max size of dimensionality is relatively small. Hence 
`int` data type is used to define ndim

- what's the max size of each dimension? Which data type is used to 
define dimension size (shape[x])?
Currently, we assume the max size of each dimension is not going to 
exceed
2^31 in real applications. Hence the data type is `int32_t`

- what's the max size of total elements? Which data type is used to 
define element size (Prod(shape))?
We assume the total number of elements in a tensor can be larger than 
2^32 in some applications such as deep graph library. We use the data type 
`int64_t` to represent the total element size. Currently due to performance 
regression in some operators (such as transpose), we used a compiler flag to 
set this data type to `int32_t` by default. Once we have ways to mitigate the 
performance regression, we will set the default data type to `int64_t`, which 
is part of the effort in this project that Rohit proposed.

What is the plan in MKLDNN to support large tensors? We may want to 
coordinate the progress since many operators are using MKLDNN implementation in 
CPU now.

Many Thanks,

Lin

On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A  wrote:

> Thank you for bringing this topic to dev, Rohit.
>
> Regarding large tensor, can you articulate:
> - what's the max size of dimensionality? Which data type is used to 
> define dimensionality (ndims)?
> - what's the max size of each dimension? Which data type is used to 
> define dimension size (shape[x])?
> - what's the max size of total elements? Which data type is used to 
> define element size (Prod(shape))?
>
> For me, any of these three can be *large*.
>
> -Original Message-
> From: Srivastava, Rohit Kumar 
> [mailto:srivastava@buckeyemail.osu.edu]
> Sent: Saturday, April 27, 2019 7:33 AM
> To: dev@mxnet.incubator.apache.org
> Subject: [RFC] Support for creation of Large Tensors in MXNet
>
> 

RE: [RFC] Support for creation of Large Tensors in MXNet

2019-05-18 Thread Lv, Tao A
Hi Rohit,

The existing MKL-DNN and its integration in MXNet should already support *large 
tensor* which means the total number of elements (Prod(shape)) can exceed 
INT_MAX. Feel free to me know if you find any issue when using MKL-DNN 
operators with large tensors.

For large dimension size (shape[x]), MKL-DNN is going to support in its 1.0 
release and will be released at the middle of year. But I'm not sure if MXNet 
has plan to support that.

Thanks,
-tao

-Original Message-
From: Srivastava, Rohit Kumar [mailto:srivastava@buckeyemail.osu.edu] 
Sent: Sunday, May 19, 2019 7:23 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Hi Tao,
There are already couple of operators implemented in MXNet that are 
currently supporting Tensors with size over ~4.5 billion. In the meantime core 
MXNet can move ahead with providing initial support for such large tensors so 
MXNet customers can start using it.

Good to hear MKLDNN will provide support for such cases. Do you have a timeline 
as to when this feature will be released ?

-Rohit

On 4/29/19, 7:18 PM, "Lv, Tao A"  wrote:

Thank you Lin! I would expect the current MKL-DNN implementation already 
supports the scenario you mentioned here. Can be verified by this issue: 
https://github.com/apache/incubator-mxnet/issues/13451

But as I said before, since we support flatten or reshape operators, so 
it's possible for users to convert a tensor with large element size to a tensor 
with large dimension size. It possibly will cause issue there.

To cover more cases, MKL-DNN is going to support INT64 dimension size in 
its coming 1.0 major release.

-tao

-Original Message-
From: Lin Yuan [mailto:apefor...@gmail.com] 
Sent: Tuesday, April 30, 2019 12:56 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Tao,

- what's the max size of dimensionality? Which data type is used to define 
dimensionality (ndims)?
We assume the max size of dimensionality is relatively small. Hence `int` 
data type is used to define ndim

- what's the max size of each dimension? Which data type is used to define 
dimension size (shape[x])?
Currently, we assume the max size of each dimension is not going to exceed
2^31 in real applications. Hence the data type is `int32_t`

- what's the max size of total elements? Which data type is used to define 
element size (Prod(shape))?
We assume the total number of elements in a tensor can be larger than 2^32 
in some applications such as deep graph library. We use the data type `int64_t` 
to represent the total element size. Currently due to performance regression in 
some operators (such as transpose), we used a compiler flag to set this data 
type to `int32_t` by default. Once we have ways to mitigate the performance 
regression, we will set the default data type to `int64_t`, which is part of 
the effort in this project that Rohit proposed.

What is the plan in MKLDNN to support large tensors? We may want to 
coordinate the progress since many operators are using MKLDNN implementation in 
CPU now.

Many Thanks,

Lin

On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A  wrote:

> Thank you for bringing this topic to dev, Rohit.
>
> Regarding large tensor, can you articulate:
> - what's the max size of dimensionality? Which data type is used to 
> define dimensionality (ndims)?
> - what's the max size of each dimension? Which data type is used to 
> define dimension size (shape[x])?
> - what's the max size of total elements? Which data type is used to 
> define element size (Prod(shape))?
>
> For me, any of these three can be *large*.
>
> -Original Message-
> From: Srivastava, Rohit Kumar 
> [mailto:srivastava@buckeyemail.osu.edu]
> Sent: Saturday, April 27, 2019 7:33 AM
> To: dev@mxnet.incubator.apache.org
> Subject: [RFC] Support for creation of Large Tensors in MXNet
>
> Dear Community,
>
> Currently MXNet supports creation of Tensors containing up to 2^32 
> elements. However there are cases where tensors of size over 5 billion 
> is required
>
> We plan to support creation of large tensors on MXNet. A design 
> proposal is ready for review:
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
>
> We will appreciate any help and feedbacks from the community.
>
> Thank you!
>
> Rohit
>




Re: [RFC] Support for creation of Large Tensors in MXNet

2019-05-18 Thread Srivastava, Rohit Kumar
Hi Tao,
There are already couple of operators implemented in MXNet that are 
currently supporting Tensors with size over ~4.5 billion. In the meantime core 
MXNet can move ahead with providing initial support for such large tensors so 
MXNet customers can start using it.

Good to hear MKLDNN will provide support for such cases. Do you have a timeline 
as to when this feature will be released ?

-Rohit

On 4/29/19, 7:18 PM, "Lv, Tao A"  wrote:

Thank you Lin! I would expect the current MKL-DNN implementation already 
supports the scenario you mentioned here. Can be verified by this issue: 
https://github.com/apache/incubator-mxnet/issues/13451

But as I said before, since we support flatten or reshape operators, so 
it's possible for users to convert a tensor with large element size to a tensor 
with large dimension size. It possibly will cause issue there.

To cover more cases, MKL-DNN is going to support INT64 dimension size in 
its coming 1.0 major release.

-tao

-Original Message-
From: Lin Yuan [mailto:apefor...@gmail.com] 
Sent: Tuesday, April 30, 2019 12:56 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: [RFC] Support for creation of Large Tensors in MXNet

Tao,

- what's the max size of dimensionality? Which data type is used to define 
dimensionality (ndims)?
We assume the max size of dimensionality is relatively small. Hence `int` 
data type is used to define ndim

- what's the max size of each dimension? Which data type is used to define 
dimension size (shape[x])?
Currently, we assume the max size of each dimension is not going to exceed
2^31 in real applications. Hence the data type is `int32_t`

- what's the max size of total elements? Which data type is used to define 
element size (Prod(shape))?
We assume the total number of elements in a tensor can be larger than 2^32 
in some applications such as deep graph library. We use the data type `int64_t` 
to represent the total element size. Currently due to performance regression in 
some operators (such as transpose), we used a compiler flag to set this data 
type to `int32_t` by default. Once we have ways to mitigate the performance 
regression, we will set the default data type to `int64_t`, which is part of 
the effort in this project that Rohit proposed.

What is the plan in MKLDNN to support large tensors? We may want to 
coordinate the progress since many operators are using MKLDNN implementation in 
CPU now.

Many Thanks,

Lin

On Sun, Apr 28, 2019 at 7:52 PM Lv, Tao A  wrote:

> Thank you for bringing this topic to dev, Rohit.
>
> Regarding large tensor, can you articulate:
> - what's the max size of dimensionality? Which data type is used to 
> define dimensionality (ndims)?
> - what's the max size of each dimension? Which data type is used to 
> define dimension size (shape[x])?
> - what's the max size of total elements? Which data type is used to 
> define element size (Prod(shape))?
>
> For me, any of these three can be *large*.
>
> -Original Message-
> From: Srivastava, Rohit Kumar 
> [mailto:srivastava@buckeyemail.osu.edu]
> Sent: Saturday, April 27, 2019 7:33 AM
> To: dev@mxnet.incubator.apache.org
> Subject: [RFC] Support for creation of Large Tensors in MXNet
>
> Dear Community,
>
> Currently MXNet supports creation of Tensors containing up to 2^32 
> elements. However there are cases where tensors of size over 5 billion 
> is required
>
> We plan to support creation of large tensors on MXNet. A design 
> proposal is ready for review:
> https://cwiki.apache.org/confluence/display/MXNET/Large+Tensor+Support
>
> We will appreciate any help and feedbacks from the community.
>
> Thank you!
>
> Rohit
>