Re: Announce: Hive-MR3 with Celeborn,

2023-11-02 Thread Sungwoo Park
Celeborn and Uniffle can also be seen as a move to separate local storage
from compute nodes.

1. In the old days, Hadoop was based on the idea of collocating compute and
storage.
2. Later a new paradigm of separating compute and storage emerged and got
popularized.
3. Now people want to not just separate compute and storage, but also
separate local storage from compute nodes.

In the future, all of shuffle/spill files might be stored in a dedicated
system like Celeborn and Uniffle. In our case of developing Hive-MR3, we
completely removed spill files for unordered edges thanks to the efficient
buffering in Celeborn.

Thanks,

--- Sungwoo

On Thu, Nov 2, 2023 at 7:31 PM Keyong Zhou  wrote:

> I think both Celeborn and Uniffle are good alternatives as a general
> shuffle service.
> I recommend that you try them : ). For any question about Celeborn, we're
> very glad
> to discuss in Celeborn's mail lists[1][2] or slack[3].
>
> [1] u...@celeborn.apache.org
> [2] d...@celeborn.apache.org
> [3]
> https://join.slack.com/t/apachecelebor-kw08030/shared_invite/zt-1ju3hd5j8-4Z5keMdzpcVMspe4UJzF4Q
>
> Thanks,
> Keyong Zhou
>
> On 2023/10/31 14:24:38 "Battula, Brahma Reddy" wrote:
> > Thanks for bringing up this. Good to see that it supports spark and
> flink.
> >
> > Have you done comparison between uniffle and celeborn..?
> >
> >
> > On 30/10/23, 8:01 AM, "Keyong Zhou"  zho...@apache.org>> wrote:
> >
> >
> > Great to hear this! It's encouraging that Celeborn helps MR3.
> >
> >
> > Celeborn is a general purpose remote shuffle service that stores and
> serves
> > shuffle data (and other intermediate data in the future) to help compute
> engines
> > better use disaggregated architecture, as well as become more efficient
> and
> > stable for huge shuffle sized jobs.
> >
> >
> > Currently Celeborn supports Hive on MR, and I think integrating with MR3
> > provides a good example to support Hive on Tez.
> >
> >
> > Thanks,
> > Keyong Zhou
> >
> >
> > On 2023/10/24 12:08:54 Sungwoo Park wrote:
> > > Hi Hive users,
> > >
> > > Before the impending release of MR3 1.8, we would like to announce the
> > > release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> > > 0.3.1).
> > >
> > > Apache Celeborn [1] is remote shuffle service, similar to Magnet [2]
> and
> > > Apache Uniffle [3] (which was discussed in this Hive mailing list a
> while
> > > ago). Celeborn officially supports Spark and Flink, and we have
> implemented
> > > an MR3-extension for Celeborn.
> > >
> > > In addition to all the benefits of using remote shuffle service,
> > > Hive-MR3-Celeborn supports direct processing of mapper output on the
> > > reducer side, which means that reducers do not store mapper output on
> local
> > > disks (for unordered edges). In this way, Hive-MR3-Celeborn can
> eliminate
> > > over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> > > This can be particularly useful when running Hive-MR3 on public clouds
> > > where fast local disk storage is expensive or not available.
> > >
> > > We have documented the usage of Hive-MR3-Celeborn in [4]. You can
> download
> > > Hive-MR3-Celeborn in [5].
> > >
> > > FYI, MR3 is an execution engine providing native support for Hadoop,
> > > Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> > > provides the performance of LLAP yet is very easy to install and
> operate.
> > > If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3
> will
> > > give you a much higher throughput thanks to its advanced resource
> sharing
> > > model.
> > >
> > > We have recently opened a Slack channel. If interested, please join the
> > > Slack channel and ask any question on MR3:
> > >
> > >
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
> <
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
> >
> > >
> > > Thank you,
> > >
> > > --- Sungwoo
> > >
> > > [1] https://celeborn.apache.org/ 
> > > [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf <
> https://www.vldb.org/pvldb/vol13/p3382-shen.pdf>
> > > [3] https://uniffle.apache.org/ 
> > > [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ <
> https://mr3docs.datamonad.com/docs/mr3/features/celeborn/>
> > > [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 <
> https://github.com/mr3project/mr3-release/releases/tag/v1.8>
> > > [6] https://mr3docs.datamonad.com/ 
> > >
> >
> >
> >
> >
>


Re: Announce: Hive-MR3 with Celeborn,

2023-11-02 Thread Keyong Zhou
I think both Celeborn and Uniffle are good alternatives as a general shuffle 
service.
I recommend that you try them : ). For any question about Celeborn, we're very 
glad
to discuss in Celeborn's mail lists[1][2] or slack[3].

[1] u...@celeborn.apache.org
[2] d...@celeborn.apache.org
[3] 
https://join.slack.com/t/apachecelebor-kw08030/shared_invite/zt-1ju3hd5j8-4Z5keMdzpcVMspe4UJzF4Q

Thanks,
Keyong Zhou

On 2023/10/31 14:24:38 "Battula, Brahma Reddy" wrote:
> Thanks for bringing up this. Good to see that it supports spark and flink.
> 
> Have you done comparison between uniffle and celeborn..?
> 
> 
> On 30/10/23, 8:01 AM, "Keyong Zhou"  > wrote:
> 
> 
> Great to hear this! It's encouraging that Celeborn helps MR3.
> 
> 
> Celeborn is a general purpose remote shuffle service that stores and serves
> shuffle data (and other intermediate data in the future) to help compute 
> engines
> better use disaggregated architecture, as well as become more efficient and
> stable for huge shuffle sized jobs.
> 
> 
> Currently Celeborn supports Hive on MR, and I think integrating with MR3
> provides a good example to support Hive on Tez.
> 
> 
> Thanks,
> Keyong Zhou
> 
> 
> On 2023/10/24 12:08:54 Sungwoo Park wrote:
> > Hi Hive users,
> >
> > Before the impending release of MR3 1.8, we would like to announce the
> > release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> > 0.3.1).
> >
> > Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and
> > Apache Uniffle [3] (which was discussed in this Hive mailing list a while
> > ago). Celeborn officially supports Spark and Flink, and we have implemented
> > an MR3-extension for Celeborn.
> >
> > In addition to all the benefits of using remote shuffle service,
> > Hive-MR3-Celeborn supports direct processing of mapper output on the
> > reducer side, which means that reducers do not store mapper output on local
> > disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate
> > over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> > This can be particularly useful when running Hive-MR3 on public clouds
> > where fast local disk storage is expensive or not available.
> >
> > We have documented the usage of Hive-MR3-Celeborn in [4]. You can download
> > Hive-MR3-Celeborn in [5].
> >
> > FYI, MR3 is an execution engine providing native support for Hadoop,
> > Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> > provides the performance of LLAP yet is very easy to install and operate.
> > If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will
> > give you a much higher throughput thanks to its advanced resource sharing
> > model.
> >
> > We have recently opened a Slack channel. If interested, please join the
> > Slack channel and ask any question on MR3:
> >
> > https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
> >  
> > 
> >
> > Thank you,
> >
> > --- Sungwoo
> >
> > [1] https://celeborn.apache.org/ 
> > [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf 
> > 
> > [3] https://uniffle.apache.org/ 
> > [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ 
> > 
> > [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 
> > 
> > [6] https://mr3docs.datamonad.com/ 
> >
> 
> 
> 
> 


Re: Announce: Hive-MR3 with Celeborn,

2023-11-01 Thread Sungwoo Park
On Thu, Nov 2, 2023 at 1:43 PM Sungwoo Park  wrote:

> Have you done comparison between uniffle and celeborn..?
>>
>
> We did not compare the performance of Uniffle and Celeborn (because
> Hive-MR3-Celeborn has been released but Hive-MR3-Uniffle is not complete
> yet). Much of the code in Hive-MR3-Celeborn is currently reused in
> Hive-MR3-Uniffle, so we think there are many architectural similarities
> between the two systems.
>
> We implemented our Celeborn extension first because a user of Hive-MR3
> wanted to use Celeborn which was already running in production. If any
> industrial user of Hive-MR3 wants to use Uniffle in production, please let
> us know.
>
> BTW, if you are using Hive-on-MapReduce or Hive-on-Tez, consider switching
> to Hive-on-Tez. You will see a huge increase (x3 to x10) in throughput.
>

Ooops, I meant switching to Hive-on-MR3 :-)


> Regards,
>
> --- Sungwoo
>
>


Re: Announce: Hive-MR3 with Celeborn,

2023-11-01 Thread Sungwoo Park
>
> Have you done comparison between uniffle and celeborn..?
>

We did not compare the performance of Uniffle and Celeborn (because
Hive-MR3-Celeborn has been released but Hive-MR3-Uniffle is not complete
yet). Much of the code in Hive-MR3-Celeborn is currently reused in
Hive-MR3-Uniffle, so we think there are many architectural similarities
between the two systems.

We implemented our Celeborn extension first because a user of Hive-MR3
wanted to use Celeborn which was already running in production. If any
industrial user of Hive-MR3 wants to use Uniffle in production, please let
us know.

BTW, if you are using Hive-on-MapReduce or Hive-on-Tez, consider switching
to Hive-on-Tez. You will see a huge increase (x3 to x10) in throughput.

Regards,

--- Sungwoo


Re: Announce: Hive-MR3 with Celeborn,

2023-10-31 Thread Battula, Brahma Reddy
Thanks for bringing up this. Good to see that it supports spark and flink.

Have you done comparison between uniffle and celeborn..?


On 30/10/23, 8:01 AM, "Keyong Zhou" mailto:zho...@apache.org>> wrote:


Great to hear this! It's encouraging that Celeborn helps MR3.


Celeborn is a general purpose remote shuffle service that stores and serves
shuffle data (and other intermediate data in the future) to help compute engines
better use disaggregated architecture, as well as become more efficient and
stable for huge shuffle sized jobs.


Currently Celeborn supports Hive on MR, and I think integrating with MR3
provides a good example to support Hive on Tez.


Thanks,
Keyong Zhou


On 2023/10/24 12:08:54 Sungwoo Park wrote:
> Hi Hive users,
>
> Before the impending release of MR3 1.8, we would like to announce the
> release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> 0.3.1).
>
> Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and
> Apache Uniffle [3] (which was discussed in this Hive mailing list a while
> ago). Celeborn officially supports Spark and Flink, and we have implemented
> an MR3-extension for Celeborn.
>
> In addition to all the benefits of using remote shuffle service,
> Hive-MR3-Celeborn supports direct processing of mapper output on the
> reducer side, which means that reducers do not store mapper output on local
> disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate
> over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> This can be particularly useful when running Hive-MR3 on public clouds
> where fast local disk storage is expensive or not available.
>
> We have documented the usage of Hive-MR3-Celeborn in [4]. You can download
> Hive-MR3-Celeborn in [5].
>
> FYI, MR3 is an execution engine providing native support for Hadoop,
> Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> provides the performance of LLAP yet is very easy to install and operate.
> If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will
> give you a much higher throughput thanks to its advanced resource sharing
> model.
>
> We have recently opened a Slack channel. If interested, please join the
> Slack channel and ask any question on MR3:
>
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
>  
> 
>
> Thank you,
>
> --- Sungwoo
>
> [1] https://celeborn.apache.org/ 
> [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf 
> 
> [3] https://uniffle.apache.org/ 
> [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ 
> 
> [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 
> 
> [6] https://mr3docs.datamonad.com/ 
>





Re: Announce: Hive-MR3 with Celeborn,

2023-10-29 Thread Keyong Zhou
Great to hear this! It's encouraging that Celeborn helps MR3.

Celeborn is a general purpose remote shuffle service that stores and serves 
shuffle data (and other intermediate data in the future) to help compute 
engines 
better use disaggregated architecture, as well as become more efficient and 
stable for huge shuffle sized jobs.

Currently Celeborn supports Hive on MR, and I think integrating with MR3 
provides a good example to support Hive on Tez.

Thanks,
Keyong Zhou

On 2023/10/24 12:08:54 Sungwoo Park wrote:
> Hi Hive users,
> 
> Before the impending release of MR3 1.8, we would like to announce the
> release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> 0.3.1).
> 
> Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and
> Apache Uniffle [3] (which was discussed in this Hive mailing list a while
> ago). Celeborn officially supports Spark and Flink, and we have implemented
> an MR3-extension for Celeborn.
> 
> In addition to all the benefits of using remote shuffle service,
> Hive-MR3-Celeborn supports direct processing of mapper output on the
> reducer side, which means that reducers do not store mapper output on local
> disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate
> over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> This can be particularly useful when running Hive-MR3 on public clouds
> where fast local disk storage is expensive or not available.
> 
> We have documented the usage of Hive-MR3-Celeborn in [4]. You can download
> Hive-MR3-Celeborn in [5].
> 
> FYI, MR3 is an execution engine providing native support for Hadoop,
> Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> provides the performance of LLAP yet is very easy to install and operate.
> If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will
> give you a much higher throughput thanks to its advanced resource sharing
> model.
> 
> We have recently opened a Slack channel. If interested, please join the
> Slack channel and ask any question on MR3:
> 
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
> 
> Thank you,
> 
> --- Sungwoo
> 
> [1] https://celeborn.apache.org/
> [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf
> [3] https://uniffle.apache.org/
> [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/
> [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8
> [6] https://mr3docs.datamonad.com/
> 


Re: Announce: Hive-MR3 with Celeborn,

2023-10-24 Thread lisoda
Thanks. I will try.



 Replied Message 
| From | Sungwoo Park |
| Date | 10/24/2023 20:08 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Announce: Hive-MR3 with Celeborn, |
Hi Hive users,


Before the impending release of MR3 1.8, we would like to announce the release 
of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn 0.3.1).

Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and Apache 
Uniffle [3] (which was discussed in this Hive mailing list a while ago). 
Celeborn officially supports Spark and Flink, and we have implemented an 
MR3-extension for Celeborn.

In addition to all the benefits of using remote shuffle service, 
Hive-MR3-Celeborn supports direct processing of mapper output on the reducer 
side, which means that reducers do not store mapper output on local disks (for 
unordered edges). In this way, Hive-MR3-Celeborn can eliminate over 95% of 
local disk writes when tested on the 10TB TPC-DS benchmark. This can be 
particularly useful when running Hive-MR3 on public clouds where fast local 
disk storage is expensive or not available.

We have documented the usage of Hive-MR3-Celeborn in [4]. You can download 
Hive-MR3-Celeborn in [5].

FYI, MR3 is an execution engine providing native support for Hadoop, 
Kubernetes, and standalone mode [6]. Hive-MR3, its main application, provides 
the performance of LLAP yet is very easy to install and operate. If you are 
using Hive-Tez for running ETL jobs, switching to Hive-MR3 will give you a much 
higher throughput thanks to its advanced resource sharing model.

We have recently opened a Slack channel. If interested, please join the Slack 
channel and ask any question on MR3:

https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg

Thank you,

--- Sungwoo

[1] https://celeborn.apache.org/
[2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf
[3] https://uniffle.apache.org/
[4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/
[5] https://github.com/mr3project/mr3-release/releases/tag/v1.8
[6] https://mr3docs.datamonad.com/