Re: [DISCUSS] Incubating Proposal for Celeborn (was: [DISCUSS] Incubating Proposal for Datark)

2022-10-10 Thread Yu Li
Hi All,

Thanks for the feedback, and great to know you like the new name :-)

@Madhawa
Thanks for the interest, as we already have enough mentors, maybe next time
(smile). And you're warmly welcome to watch and join the project.

Best Regards,
Yu


On Thu, 6 Oct 2022 at 07:39, Madhawa Gunasekara  wrote:

> Hi,
>
> The name of this project is interesting. Soon we will find some other
> characters also in future podlings. I would like to become a mentor for
> this podling.
>
> Thanks,
> Madhawa
>
>
> On Tue, Oct 4, 2022 at 6:06 PM Roman Shaposhnik 
> wrote:
>
> > At the risk of drawing the ire of the top-post police -- I have to
> > top-post ;-)
> >
> > Kudos on a really cool name!
> >
> > Thanks,
> > Roman.
> >
> > On Fri, Sep 30, 2022 at 6:10 PM keyong zhou 
> > wrote:
> > >
> > > Hi Yu,
> > >
> > >  Yeah the name is from 《The Silmarillion》 and Celeborn is a seedling of
> > the
> > > tree Galathilion, which in turn had been made in image of Telperion[1].
> > >
> > >  We're glad that you like it :)
> > >
> > > Thanks.
> > >
> > > Best Regards,
> > > Keyong
> > >
> > > [1] https://tolkiengateway.net/wiki/Celeborn_(White_Tree)
> > >
> > > Yu Xiao  于2022年9月30日周五 18:13写道:
> > >
> > > > Hi,
> > > >
> > > >  Very happy to see the name ,
> > > >
> > > >  BTW, It refers to Telperion(silver tree)in Two Trees of Valinor?
> > > >
> > > > 《The Silmarillion》is my favorite stories. And now the TV of 《The
> Rings
> > > > of Power》in theaters.
> > > >
> > > >   The name is really great,
> > > >
> > > >  I think it's cool to introduce this fantasy tree to the Apache
> world.
> > > >
> > > >   > Celeborn is the name of the White Tree in J. R. R. Tolkien's
> > fantasy
> > > >  stories. "Celeb" is the sindarin for "silver"
> > > >
> > > > Best wishes!
> > > >
> > > > Yu Xiao
> > > > Apache ShenYu
> > > >
> > > > Yu Li  于2022年9月30日周五 17:39写道:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > After more careful searching and legal consultation, we found the
> > name
> > > > > "Datark" may have some trademark conflicts and (after some
> community
> > > > > discussion) decided to rename the project to "Celeborn", and have
> > updated
> > > > > the proposal document accordingly [1].
> > > > >
> > > > > Celeborn is the name of the White Tree in J. R. R. Tolkien's
> fantasy
> > > > > stories. "Celeb" is the sindarin for "silver", while "orn" is that
> > for
> > > > > "tree", and we think it's cool to introduce this fantasy tree to
> the
> > > > Apache
> > > > > world. Hopefully people also like this new name.
> > > > >
> > > > > And we'd like to continue the discussion and look forward to more
> > > > feedback
> > > > > (and thanks for the existing ones).
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Best Regards,
> > > > > Yu
> > > > >
> > > > > [1]
> > > >
> https://cwiki.apache.org/confluence/display/INCUBATOR/CelebornProposal
> > > > >
> > > > > On Tue, 27 Sept 2022 at 17:07, Kaijie Chen  wrote:
> > > > >
> > > > > > Hi Yu,
> > > > > >
> > > > > > Thanks for your explanation. Yes it makes sense.
> > > > > > It's nice we can see the migration process in the community.
> > > > > > And it will probably help create better docs for the project.
> > > > > >
> > > > > > Best,
> > > > > > Kaijie
> > > > > >
> > > > > > On 2022/09/27 06:05:10 Yu Li wrote:
> > > > > > > Hi BLAST and Kaijie,
> > > > > > >
> > > > > > > Yes, we've done quite some work on merging the two rss projects
> > (or
> > > > more
> > > > > > > accurately, try to extract the high-level architecture and
> > > > interfaces to
> > > > > > > form a more general framework) but still not fully completed.
> > > > However, on
> > > > > > > second thought, we feel incubating the project first and
> > involving
> > > > more
> > > > > > > community forces to discuss and complete the work together
> might
> > be a
> > > > > > > better idea. I believe more information on this topic will be
> > shared
> > > > in
> > > > > > the
> > > > > > > community, and further discussion and the incubation process
> > won't
> > > > block
> > > > > > > each other. Wdyt?
> > > > > > >
> > > > > > > @Gon
> > > > > > > Apache Nemo is an interesting project and we will further
> > > > investigate and
> > > > > > > search for the cooperation between the two projects (smile).
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Yu
> > > > > > >
> > > > > > >
> > > > > > > On Tue, 27 Sept 2022 at 10:45, Kaijie Chen 
> > wrote:
> > > > > > >
> > > > > > > > Hi Yu,
> > > > > > > >
> > > > > > > > You mentioned you will merge RemoteShuffleService and
> > > > > > flink-remote-shuffle
> > > > > > > > previously in this thread:
> > > > > > > >
> > > > > > > >
> > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > > > > > >
> > > > > > > > > Our proposal is still not fully prepared because the merge
> of
> > > > the two
> > > > > > > > projects
> > > > > > > > > is still in progress
> > > > > > > >
> > > > > > > > Have you done the merge? Or is there a plan to merge them
> > later?

Re: [DISCUSS] Incubating Proposal for Celeborn (was: [DISCUSS] Incubating Proposal for Datark)

2022-10-05 Thread Madhawa Gunasekara
Hi,

The name of this project is interesting. Soon we will find some other
characters also in future podlings. I would like to become a mentor for
this podling.

Thanks,
Madhawa


On Tue, Oct 4, 2022 at 6:06 PM Roman Shaposhnik 
wrote:

> At the risk of drawing the ire of the top-post police -- I have to
> top-post ;-)
>
> Kudos on a really cool name!
>
> Thanks,
> Roman.
>
> On Fri, Sep 30, 2022 at 6:10 PM keyong zhou 
> wrote:
> >
> > Hi Yu,
> >
> >  Yeah the name is from 《The Silmarillion》 and Celeborn is a seedling of
> the
> > tree Galathilion, which in turn had been made in image of Telperion[1].
> >
> >  We're glad that you like it :)
> >
> > Thanks.
> >
> > Best Regards,
> > Keyong
> >
> > [1] https://tolkiengateway.net/wiki/Celeborn_(White_Tree)
> >
> > Yu Xiao  于2022年9月30日周五 18:13写道:
> >
> > > Hi,
> > >
> > >  Very happy to see the name ,
> > >
> > >  BTW, It refers to Telperion(silver tree)in Two Trees of Valinor?
> > >
> > > 《The Silmarillion》is my favorite stories. And now the TV of 《The Rings
> > > of Power》in theaters.
> > >
> > >   The name is really great,
> > >
> > >  I think it's cool to introduce this fantasy tree to the Apache world.
> > >
> > >   > Celeborn is the name of the White Tree in J. R. R. Tolkien's
> fantasy
> > >  stories. "Celeb" is the sindarin for "silver"
> > >
> > > Best wishes!
> > >
> > > Yu Xiao
> > > Apache ShenYu
> > >
> > > Yu Li  于2022年9月30日周五 17:39写道:
> > > >
> > > > Hi All,
> > > >
> > > > After more careful searching and legal consultation, we found the
> name
> > > > "Datark" may have some trademark conflicts and (after some community
> > > > discussion) decided to rename the project to "Celeborn", and have
> updated
> > > > the proposal document accordingly [1].
> > > >
> > > > Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy
> > > > stories. "Celeb" is the sindarin for "silver", while "orn" is that
> for
> > > > "tree", and we think it's cool to introduce this fantasy tree to the
> > > Apache
> > > > world. Hopefully people also like this new name.
> > > >
> > > > And we'd like to continue the discussion and look forward to more
> > > feedback
> > > > (and thanks for the existing ones).
> > > >
> > > > Thanks.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > > [1]
> > > https://cwiki.apache.org/confluence/display/INCUBATOR/CelebornProposal
> > > >
> > > > On Tue, 27 Sept 2022 at 17:07, Kaijie Chen  wrote:
> > > >
> > > > > Hi Yu,
> > > > >
> > > > > Thanks for your explanation. Yes it makes sense.
> > > > > It's nice we can see the migration process in the community.
> > > > > And it will probably help create better docs for the project.
> > > > >
> > > > > Best,
> > > > > Kaijie
> > > > >
> > > > > On 2022/09/27 06:05:10 Yu Li wrote:
> > > > > > Hi BLAST and Kaijie,
> > > > > >
> > > > > > Yes, we've done quite some work on merging the two rss projects
> (or
> > > more
> > > > > > accurately, try to extract the high-level architecture and
> > > interfaces to
> > > > > > form a more general framework) but still not fully completed.
> > > However, on
> > > > > > second thought, we feel incubating the project first and
> involving
> > > more
> > > > > > community forces to discuss and complete the work together might
> be a
> > > > > > better idea. I believe more information on this topic will be
> shared
> > > in
> > > > > the
> > > > > > community, and further discussion and the incubation process
> won't
> > > block
> > > > > > each other. Wdyt?
> > > > > >
> > > > > > @Gon
> > > > > > Apache Nemo is an interesting project and we will further
> > > investigate and
> > > > > > search for the cooperation between the two projects (smile).
> > > > > >
> > > > > > Best Regards,
> > > > > > Yu
> > > > > >
> > > > > >
> > > > > > On Tue, 27 Sept 2022 at 10:45, Kaijie Chen 
> wrote:
> > > > > >
> > > > > > > Hi Yu,
> > > > > > >
> > > > > > > You mentioned you will merge RemoteShuffleService and
> > > > > flink-remote-shuffle
> > > > > > > previously in this thread:
> > > > > > >
> > > > > > >
> https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > > > > >
> > > > > > > > Our proposal is still not fully prepared because the merge of
> > > the two
> > > > > > > projects
> > > > > > > > is still in progress
> > > > > > >
> > > > > > > Have you done the merge? Or is there a plan to merge them
> later?
> > > > > > > I just checked the github repositories, seems there is no
> mention
> > > of
> > > > > the
> > > > > > > merge
> > > > > > > and both project are still being actively developed.
> > > > > > >
> > > > > > > Best,
> > > > > > > Kaijie
> > > > > > >
> > > > > > >
> > > -
> > > > > > > To unsubscribe, e-mail:
> general-unsubscr...@incubator.apache.org
> > > > > > > For additional commands, e-mail:
> general-h...@incubator.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> 

Re: [DISCUSS] Incubating Proposal for Celeborn (was: [DISCUSS] Incubating Proposal for Datark)

2022-10-04 Thread Roman Shaposhnik
At the risk of drawing the ire of the top-post police -- I have to top-post ;-)

Kudos on a really cool name!

Thanks,
Roman.

On Fri, Sep 30, 2022 at 6:10 PM keyong zhou  wrote:
>
> Hi Yu,
>
>  Yeah the name is from 《The Silmarillion》 and Celeborn is a seedling of the
> tree Galathilion, which in turn had been made in image of Telperion[1].
>
>  We're glad that you like it :)
>
> Thanks.
>
> Best Regards,
> Keyong
>
> [1] https://tolkiengateway.net/wiki/Celeborn_(White_Tree)
>
> Yu Xiao  于2022年9月30日周五 18:13写道:
>
> > Hi,
> >
> >  Very happy to see the name ,
> >
> >  BTW, It refers to Telperion(silver tree)in Two Trees of Valinor?
> >
> > 《The Silmarillion》is my favorite stories. And now the TV of 《The Rings
> > of Power》in theaters.
> >
> >   The name is really great,
> >
> >  I think it's cool to introduce this fantasy tree to the Apache world.
> >
> >   > Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy
> >  stories. "Celeb" is the sindarin for "silver"
> >
> > Best wishes!
> >
> > Yu Xiao
> > Apache ShenYu
> >
> > Yu Li  于2022年9月30日周五 17:39写道:
> > >
> > > Hi All,
> > >
> > > After more careful searching and legal consultation, we found the name
> > > "Datark" may have some trademark conflicts and (after some community
> > > discussion) decided to rename the project to "Celeborn", and have updated
> > > the proposal document accordingly [1].
> > >
> > > Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy
> > > stories. "Celeb" is the sindarin for "silver", while "orn" is that for
> > > "tree", and we think it's cool to introduce this fantasy tree to the
> > Apache
> > > world. Hopefully people also like this new name.
> > >
> > > And we'd like to continue the discussion and look forward to more
> > feedback
> > > (and thanks for the existing ones).
> > >
> > > Thanks.
> > >
> > > Best Regards,
> > > Yu
> > >
> > > [1]
> > https://cwiki.apache.org/confluence/display/INCUBATOR/CelebornProposal
> > >
> > > On Tue, 27 Sept 2022 at 17:07, Kaijie Chen  wrote:
> > >
> > > > Hi Yu,
> > > >
> > > > Thanks for your explanation. Yes it makes sense.
> > > > It's nice we can see the migration process in the community.
> > > > And it will probably help create better docs for the project.
> > > >
> > > > Best,
> > > > Kaijie
> > > >
> > > > On 2022/09/27 06:05:10 Yu Li wrote:
> > > > > Hi BLAST and Kaijie,
> > > > >
> > > > > Yes, we've done quite some work on merging the two rss projects (or
> > more
> > > > > accurately, try to extract the high-level architecture and
> > interfaces to
> > > > > form a more general framework) but still not fully completed.
> > However, on
> > > > > second thought, we feel incubating the project first and involving
> > more
> > > > > community forces to discuss and complete the work together might be a
> > > > > better idea. I believe more information on this topic will be shared
> > in
> > > > the
> > > > > community, and further discussion and the incubation process won't
> > block
> > > > > each other. Wdyt?
> > > > >
> > > > > @Gon
> > > > > Apache Nemo is an interesting project and we will further
> > investigate and
> > > > > search for the cooperation between the two projects (smile).
> > > > >
> > > > > Best Regards,
> > > > > Yu
> > > > >
> > > > >
> > > > > On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:
> > > > >
> > > > > > Hi Yu,
> > > > > >
> > > > > > You mentioned you will merge RemoteShuffleService and
> > > > flink-remote-shuffle
> > > > > > previously in this thread:
> > > > > >
> > > > > > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > > > >
> > > > > > > Our proposal is still not fully prepared because the merge of
> > the two
> > > > > > projects
> > > > > > > is still in progress
> > > > > >
> > > > > > Have you done the merge? Or is there a plan to merge them later?
> > > > > > I just checked the github repositories, seems there is no mention
> > of
> > > > the
> > > > > > merge
> > > > > > and both project are still being actively developed.
> > > > > >
> > > > > > Best,
> > > > > > Kaijie
> > > > > >
> > > > > >
> > -
> > > > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > > > >
> > > > > >
> > > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > >
> > > >
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional 

Re: [DISCUSS] Incubating Proposal for Celeborn (was: [DISCUSS] Incubating Proposal for Datark)

2022-09-30 Thread keyong zhou
Hi Yu,

 Yeah the name is from 《The Silmarillion》 and Celeborn is a seedling of the
tree Galathilion, which in turn had been made in image of Telperion[1].

 We're glad that you like it :)

Thanks.

Best Regards,
Keyong

[1] https://tolkiengateway.net/wiki/Celeborn_(White_Tree)

Yu Xiao  于2022年9月30日周五 18:13写道:

> Hi,
>
>  Very happy to see the name ,
>
>  BTW, It refers to Telperion(silver tree)in Two Trees of Valinor?
>
> 《The Silmarillion》is my favorite stories. And now the TV of 《The Rings
> of Power》in theaters.
>
>   The name is really great,
>
>  I think it's cool to introduce this fantasy tree to the Apache world.
>
>   > Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy
>  stories. "Celeb" is the sindarin for "silver"
>
> Best wishes!
>
> Yu Xiao
> Apache ShenYu
>
> Yu Li  于2022年9月30日周五 17:39写道:
> >
> > Hi All,
> >
> > After more careful searching and legal consultation, we found the name
> > "Datark" may have some trademark conflicts and (after some community
> > discussion) decided to rename the project to "Celeborn", and have updated
> > the proposal document accordingly [1].
> >
> > Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy
> > stories. "Celeb" is the sindarin for "silver", while "orn" is that for
> > "tree", and we think it's cool to introduce this fantasy tree to the
> Apache
> > world. Hopefully people also like this new name.
> >
> > And we'd like to continue the discussion and look forward to more
> feedback
> > (and thanks for the existing ones).
> >
> > Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1]
> https://cwiki.apache.org/confluence/display/INCUBATOR/CelebornProposal
> >
> > On Tue, 27 Sept 2022 at 17:07, Kaijie Chen  wrote:
> >
> > > Hi Yu,
> > >
> > > Thanks for your explanation. Yes it makes sense.
> > > It's nice we can see the migration process in the community.
> > > And it will probably help create better docs for the project.
> > >
> > > Best,
> > > Kaijie
> > >
> > > On 2022/09/27 06:05:10 Yu Li wrote:
> > > > Hi BLAST and Kaijie,
> > > >
> > > > Yes, we've done quite some work on merging the two rss projects (or
> more
> > > > accurately, try to extract the high-level architecture and
> interfaces to
> > > > form a more general framework) but still not fully completed.
> However, on
> > > > second thought, we feel incubating the project first and involving
> more
> > > > community forces to discuss and complete the work together might be a
> > > > better idea. I believe more information on this topic will be shared
> in
> > > the
> > > > community, and further discussion and the incubation process won't
> block
> > > > each other. Wdyt?
> > > >
> > > > @Gon
> > > > Apache Nemo is an interesting project and we will further
> investigate and
> > > > search for the cooperation between the two projects (smile).
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:
> > > >
> > > > > Hi Yu,
> > > > >
> > > > > You mentioned you will merge RemoteShuffleService and
> > > flink-remote-shuffle
> > > > > previously in this thread:
> > > > >
> > > > > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > > >
> > > > > > Our proposal is still not fully prepared because the merge of
> the two
> > > > > projects
> > > > > > is still in progress
> > > > >
> > > > > Have you done the merge? Or is there a plan to merge them later?
> > > > > I just checked the github repositories, seems there is no mention
> of
> > > the
> > > > > merge
> > > > > and both project are still being actively developed.
> > > > >
> > > > > Best,
> > > > > Kaijie
> > > > >
> > > > >
> -
> > > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > > >
> > > > >
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Incubating Proposal for Celeborn (was: [DISCUSS] Incubating Proposal for Datark)

2022-09-30 Thread Yu Xiao
Hi,

 Very happy to see the name ,

 BTW, It refers to Telperion(silver tree)in Two Trees of Valinor?

《The Silmarillion》is my favorite stories. And now the TV of 《The Rings
of Power》in theaters.

  The name is really great,

 I think it's cool to introduce this fantasy tree to the Apache world.

  > Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy
 stories. "Celeb" is the sindarin for "silver"

Best wishes!

Yu Xiao
Apache ShenYu

Yu Li  于2022年9月30日周五 17:39写道:
>
> Hi All,
>
> After more careful searching and legal consultation, we found the name
> "Datark" may have some trademark conflicts and (after some community
> discussion) decided to rename the project to "Celeborn", and have updated
> the proposal document accordingly [1].
>
> Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy
> stories. "Celeb" is the sindarin for "silver", while "orn" is that for
> "tree", and we think it's cool to introduce this fantasy tree to the Apache
> world. Hopefully people also like this new name.
>
> And we'd like to continue the discussion and look forward to more feedback
> (and thanks for the existing ones).
>
> Thanks.
>
> Best Regards,
> Yu
>
> [1] https://cwiki.apache.org/confluence/display/INCUBATOR/CelebornProposal
>
> On Tue, 27 Sept 2022 at 17:07, Kaijie Chen  wrote:
>
> > Hi Yu,
> >
> > Thanks for your explanation. Yes it makes sense.
> > It's nice we can see the migration process in the community.
> > And it will probably help create better docs for the project.
> >
> > Best,
> > Kaijie
> >
> > On 2022/09/27 06:05:10 Yu Li wrote:
> > > Hi BLAST and Kaijie,
> > >
> > > Yes, we've done quite some work on merging the two rss projects (or more
> > > accurately, try to extract the high-level architecture and interfaces to
> > > form a more general framework) but still not fully completed. However, on
> > > second thought, we feel incubating the project first and involving more
> > > community forces to discuss and complete the work together might be a
> > > better idea. I believe more information on this topic will be shared in
> > the
> > > community, and further discussion and the incubation process won't block
> > > each other. Wdyt?
> > >
> > > @Gon
> > > Apache Nemo is an interesting project and we will further investigate and
> > > search for the cooperation between the two projects (smile).
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:
> > >
> > > > Hi Yu,
> > > >
> > > > You mentioned you will merge RemoteShuffleService and
> > flink-remote-shuffle
> > > > previously in this thread:
> > > >
> > > > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > >
> > > > > Our proposal is still not fully prepared because the merge of the two
> > > > projects
> > > > > is still in progress
> > > >
> > > > Have you done the merge? Or is there a plan to merge them later?
> > > > I just checked the github repositories, seems there is no mention of
> > the
> > > > merge
> > > > and both project are still being actively developed.
> > > >
> > > > Best,
> > > > Kaijie
> > > >
> > > > -
> > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > >
> > > >
> > >
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[DISCUSS] Incubating Proposal for Celeborn (was: [DISCUSS] Incubating Proposal for Datark)

2022-09-30 Thread Yu Li
Hi All,

After more careful searching and legal consultation, we found the name
"Datark" may have some trademark conflicts and (after some community
discussion) decided to rename the project to "Celeborn", and have updated
the proposal document accordingly [1].

Celeborn is the name of the White Tree in J. R. R. Tolkien's fantasy
stories. "Celeb" is the sindarin for "silver", while "orn" is that for
"tree", and we think it's cool to introduce this fantasy tree to the Apache
world. Hopefully people also like this new name.

And we'd like to continue the discussion and look forward to more feedback
(and thanks for the existing ones).

Thanks.

Best Regards,
Yu

[1] https://cwiki.apache.org/confluence/display/INCUBATOR/CelebornProposal

On Tue, 27 Sept 2022 at 17:07, Kaijie Chen  wrote:

> Hi Yu,
>
> Thanks for your explanation. Yes it makes sense.
> It's nice we can see the migration process in the community.
> And it will probably help create better docs for the project.
>
> Best,
> Kaijie
>
> On 2022/09/27 06:05:10 Yu Li wrote:
> > Hi BLAST and Kaijie,
> >
> > Yes, we've done quite some work on merging the two rss projects (or more
> > accurately, try to extract the high-level architecture and interfaces to
> > form a more general framework) but still not fully completed. However, on
> > second thought, we feel incubating the project first and involving more
> > community forces to discuss and complete the work together might be a
> > better idea. I believe more information on this topic will be shared in
> the
> > community, and further discussion and the incubation process won't block
> > each other. Wdyt?
> >
> > @Gon
> > Apache Nemo is an interesting project and we will further investigate and
> > search for the cooperation between the two projects (smile).
> >
> > Best Regards,
> > Yu
> >
> >
> > On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:
> >
> > > Hi Yu,
> > >
> > > You mentioned you will merge RemoteShuffleService and
> flink-remote-shuffle
> > > previously in this thread:
> > >
> > > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > >
> > > > Our proposal is still not fully prepared because the merge of the two
> > > projects
> > > > is still in progress
> > >
> > > Have you done the merge? Or is there a plan to merge them later?
> > > I just checked the github repositories, seems there is no mention of
> the
> > > merge
> > > and both project are still being actively developed.
> > >
> > > Best,
> > > Kaijie
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-27 Thread Kaijie Chen
Hi Yu,

Thanks for your explanation. Yes it makes sense.
It's nice we can see the migration process in the community.
And it will probably help create better docs for the project.

Best,
Kaijie

On 2022/09/27 06:05:10 Yu Li wrote:
> Hi BLAST and Kaijie,
> 
> Yes, we've done quite some work on merging the two rss projects (or more
> accurately, try to extract the high-level architecture and interfaces to
> form a more general framework) but still not fully completed. However, on
> second thought, we feel incubating the project first and involving more
> community forces to discuss and complete the work together might be a
> better idea. I believe more information on this topic will be shared in the
> community, and further discussion and the incubation process won't block
> each other. Wdyt?
> 
> @Gon
> Apache Nemo is an interesting project and we will further investigate and
> search for the cooperation between the two projects (smile).
> 
> Best Regards,
> Yu
> 
> 
> On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:
> 
> > Hi Yu,
> >
> > You mentioned you will merge RemoteShuffleService and flink-remote-shuffle
> > previously in this thread:
> >
> > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> >
> > > Our proposal is still not fully prepared because the merge of the two
> > projects
> > > is still in progress
> >
> > Have you done the merge? Or is there a plan to merge them later?
> > I just checked the github repositories, seems there is no mention of the
> > merge
> > and both project are still being actively developed.
> >
> > Best,
> > Kaijie
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
> 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-27 Thread tison
Hi Gabriel,

Thanks for your explanation!

Best,
tison.


Gabriel Lee  于2022年9月27日周二 16:25写道:

> Hi Tison,
>
> Thanks for your curiosity.
>
> To be clear, as I know, RSS[1] was born in 2019 and has grown rapidly from
> then. RSS was open-sourced in Dec, 2021 [2]. But before this, RSS has
> already become a commercialized product on Alibaba Cloud EMR and attracts
> lots of attention from cloud customers. You can refer to [3] for more
> details if you want.
>
> Best,
> Gabriel
>
> [1] https://www.alibabacloud.com/help/en/e-mapreduce/latest/rss-new
> [2] https://github.com/alibaba/RemoteShuffleService
> [3] https://www.51cto.com/article/699346.html
>
> On Tue, 27 Sept 2022 at 15:45, tison  wrote:
>
> > Hi Gabriel,
> >
> > > Datark became a popular remote shuffle service in the past few years
> and
> > > I'm very glad to see it will be one of us soon.
> >
> > Out of curiosity, I can see the history of this project starts from Dec.
> > 2021. How can it be a popular solution for years?
> >
> > Best,
> > tison.
> >
> >
> > Yu Li  于2022年9月27日周二 14:05写道:
> >
> > > Hi BLAST and Kaijie,
> > >
> > > Yes, we've done quite some work on merging the two rss projects (or
> more
> > > accurately, try to extract the high-level architecture and interfaces
> to
> > > form a more general framework) but still not fully completed. However,
> on
> > > second thought, we feel incubating the project first and involving more
> > > community forces to discuss and complete the work together might be a
> > > better idea. I believe more information on this topic will be shared in
> > the
> > > community, and further discussion and the incubation process won't
> block
> > > each other. Wdyt?
> > >
> > > @Gon
> > > Apache Nemo is an interesting project and we will further investigate
> and
> > > search for the cooperation between the two projects (smile).
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:
> > >
> > > > Hi Yu,
> > > >
> > > > You mentioned you will merge RemoteShuffleService and
> > > flink-remote-shuffle
> > > > previously in this thread:
> > > >
> > > > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > >
> > > > > Our proposal is still not fully prepared because the merge of the
> two
> > > > projects
> > > > > is still in progress
> > > >
> > > > Have you done the merge? Or is there a plan to merge them later?
> > > > I just checked the github repositories, seems there is no mention of
> > the
> > > > merge
> > > > and both project are still being actively developed.
> > > >
> > > > Best,
> > > > Kaijie
> > > >
> > > > -
> > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-27 Thread Gabriel Lee
Hi Tison,

Thanks for your curiosity.

To be clear, as I know, RSS[1] was born in 2019 and has grown rapidly from
then. RSS was open-sourced in Dec, 2021 [2]. But before this, RSS has
already become a commercialized product on Alibaba Cloud EMR and attracts
lots of attention from cloud customers. You can refer to [3] for more
details if you want.

Best,
Gabriel

[1] https://www.alibabacloud.com/help/en/e-mapreduce/latest/rss-new
[2] https://github.com/alibaba/RemoteShuffleService
[3] https://www.51cto.com/article/699346.html

On Tue, 27 Sept 2022 at 15:45, tison  wrote:

> Hi Gabriel,
>
> > Datark became a popular remote shuffle service in the past few years and
> > I'm very glad to see it will be one of us soon.
>
> Out of curiosity, I can see the history of this project starts from Dec.
> 2021. How can it be a popular solution for years?
>
> Best,
> tison.
>
>
> Yu Li  于2022年9月27日周二 14:05写道:
>
> > Hi BLAST and Kaijie,
> >
> > Yes, we've done quite some work on merging the two rss projects (or more
> > accurately, try to extract the high-level architecture and interfaces to
> > form a more general framework) but still not fully completed. However, on
> > second thought, we feel incubating the project first and involving more
> > community forces to discuss and complete the work together might be a
> > better idea. I believe more information on this topic will be shared in
> the
> > community, and further discussion and the incubation process won't block
> > each other. Wdyt?
> >
> > @Gon
> > Apache Nemo is an interesting project and we will further investigate and
> > search for the cooperation between the two projects (smile).
> >
> > Best Regards,
> > Yu
> >
> >
> > On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:
> >
> > > Hi Yu,
> > >
> > > You mentioned you will merge RemoteShuffleService and
> > flink-remote-shuffle
> > > previously in this thread:
> > >
> > > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > >
> > > > Our proposal is still not fully prepared because the merge of the two
> > > projects
> > > > is still in progress
> > >
> > > Have you done the merge? Or is there a plan to merge them later?
> > > I just checked the github repositories, seems there is no mention of
> the
> > > merge
> > > and both project are still being actively developed.
> > >
> > > Best,
> > > Kaijie
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
> >
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-27 Thread tison
Hi Gabriel,

> Datark became a popular remote shuffle service in the past few years and
> I'm very glad to see it will be one of us soon.

Out of curiosity, I can see the history of this project starts from Dec.
2021. How can it be a popular solution for years?

Best,
tison.


Yu Li  于2022年9月27日周二 14:05写道:

> Hi BLAST and Kaijie,
>
> Yes, we've done quite some work on merging the two rss projects (or more
> accurately, try to extract the high-level architecture and interfaces to
> form a more general framework) but still not fully completed. However, on
> second thought, we feel incubating the project first and involving more
> community forces to discuss and complete the work together might be a
> better idea. I believe more information on this topic will be shared in the
> community, and further discussion and the incubation process won't block
> each other. Wdyt?
>
> @Gon
> Apache Nemo is an interesting project and we will further investigate and
> search for the cooperation between the two projects (smile).
>
> Best Regards,
> Yu
>
>
> On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:
>
> > Hi Yu,
> >
> > You mentioned you will merge RemoteShuffleService and
> flink-remote-shuffle
> > previously in this thread:
> >
> > https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> >
> > > Our proposal is still not fully prepared because the merge of the two
> > projects
> > > is still in progress
> >
> > Have you done the merge? Or is there a plan to merge them later?
> > I just checked the github repositories, seems there is no mention of the
> > merge
> > and both project are still being actively developed.
> >
> > Best,
> > Kaijie
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-27 Thread Yu Li
Hi BLAST and Kaijie,

Yes, we've done quite some work on merging the two rss projects (or more
accurately, try to extract the high-level architecture and interfaces to
form a more general framework) but still not fully completed. However, on
second thought, we feel incubating the project first and involving more
community forces to discuss and complete the work together might be a
better idea. I believe more information on this topic will be shared in the
community, and further discussion and the incubation process won't block
each other. Wdyt?

@Gon
Apache Nemo is an interesting project and we will further investigate and
search for the cooperation between the two projects (smile).

Best Regards,
Yu


On Tue, 27 Sept 2022 at 10:45, Kaijie Chen  wrote:

> Hi Yu,
>
> You mentioned you will merge RemoteShuffleService and flink-remote-shuffle
> previously in this thread:
>
> https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
>
> > Our proposal is still not fully prepared because the merge of the two
> projects
> > is still in progress
>
> Have you done the merge? Or is there a plan to merge them later?
> I just checked the github repositories, seems there is no mention of the
> merge
> and both project are still being actively developed.
>
> Best,
> Kaijie
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-26 Thread Kaijie Chen
Hi Yu,

You mentioned you will merge RemoteShuffleService and flink-remote-shuffle
previously in this thread:

https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz

> Our proposal is still not fully prepared because the merge of the two projects
> is still in progress

Have you done the merge? Or is there a plan to merge them later?
I just checked the github repositories, seems there is no mention of the merge
and both project are still being actively developed.

Best,
Kaijie

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [DISCUSS] Incubating Proposal for Datark

2022-09-26 Thread zhongqiang.czq
+1
Good

On 2022/09/22 03:45:10 Yu Li wrote:
> Hi All,
> 
> I would like to propose Datark [1] as a new apache incubator project, and
> you can find the proposal [2] of Datark for more details.
> 
> Datark is an intermediate (shuffle and spilled) data service for big data
> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> performance, stability, and flexibility. It aims at enabling computing
> engines to fully embrace the disaggregated architecture. In a lot of cases,
> intermediate data depends on large local disks, and is often a major cause
> of inefficiency, instability, and inflexibility in the lifecycle of a
> distributed job. Datark solves the problems through the following core
> designs:
> 
> 1. Push-based shuffle plus partition data aggregation to turn random IO
> access into sequential access.
> 2. FileSystem-like API to support writing spilled data.
> 3. Hierarchical storage from memory to DFS/object store to enable fast
> cache and massive storage space.
> 4. Engine-irrelevant APIs for easy integrating to various engines.
> 5. Extended fault tolerance and data replication to increase reliability
> 
> Datark is currently adopted in the production environment at both Alibaba
> and many other companies, serving petabytes of data per day. Beyond that,
> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> and Synnex. Most of these users have made contributions to the project,
> forming an active community with dozens of developers.
> 
> The proposed initial committers are interested in joining ASF to reinforce
> extensive collaboration and build a more vibrant community. We believe the
> Datark project will provide tremendous value for the community if it is
> introduced into the Apache incubator.
> 
> I will help this project as the champion and many thanks to our four other
> mentors:
> 
> * Becket Qin (j...@apache.org)
> * Duo Zhang (zhang...@apache.org)
> * Lidong Dai (lidong...@apache.org)
> * Willem Jiang (ningji...@apache.org)
> 
> FWIW, although with different solutions, the issues Datark aims to resolve
> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> this during the discussion phase of Uniffle incubation (when we were also
> preparing for the incubation) and had some open and friendly discussion to
> see whether there could be a joint force [4], and finally decided to
> develop independently for the time being [5].
> 
> Look forward to your feedback. Thanks.
> 
> Best Regards,
> Yu
> 
> [1] https://github.com/alibaba/RemoteShuffleService
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> [3] https://uniffle.apache.org/
> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [DISCUSS] Incubating Proposal for Datark

2022-09-25 Thread 李孟
+1 (non-binding)
在 2022年9月26日 +0800 AM10:47,general@incubator.apache.org,写道:
>
> +1 (non-binding)


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-25 Thread hongbin ma
Good luck to Datark

On Mon, Sep 26, 2022 at 11:04 AM yOA zzha  wrote:

> Good luck to this project.
>
> Jiayi Liu  于2022年9月26日周一 10:55写道:
>
> > Good luck to Datark.
> >
> > Kaijie Chen  于2022年9月26日周一 10:47写道:
> >
> > > +1 (non-binding)
> > >
> > > Good luck.
> > >
> > > Kaijie
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
> >
>


-- 
Regards,
Hongbin Ma


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-25 Thread yOA zzha
Good luck to this project.

Jiayi Liu  于2022年9月26日周一 10:55写道:

> Good luck to Datark.
>
> Kaijie Chen  于2022年9月26日周一 10:47写道:
>
> > +1 (non-binding)
> >
> > Good luck.
> >
> > Kaijie
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-25 Thread Jiayi Liu
Good luck to Datark.

Kaijie Chen  于2022年9月26日周一 10:47写道:

> +1 (non-binding)
>
> Good luck.
>
> Kaijie
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


RE: [DISCUSS] Incubating Proposal for Datark

2022-09-25 Thread Kaijie Chen
+1 (non-binding)

Good luck.

Kaijie

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-25 Thread Byung-Gon Chun
Thanks for the interesting proposal!
Another related Apache project is Nemo: https://nemo.apache.org/.

-Gon


On Mon, Sep 26, 2022 at 1:01 AM li gang  wrote:

> Glad to see this proposal, it's an interesting project,good luck.
>
> Yu Li  于2022年9月22日周四 11:45写道:
>
> > Hi All,
> >
> > I would like to propose Datark [1] as a new apache incubator project, and
> > you can find the proposal [2] of Datark for more details.
> >
> > Datark is an intermediate (shuffle and spilled) data service for big data
> > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > performance, stability, and flexibility. It aims at enabling computing
> > engines to fully embrace the disaggregated architecture. In a lot of
> cases,
> > intermediate data depends on large local disks, and is often a major
> cause
> > of inefficiency, instability, and inflexibility in the lifecycle of a
> > distributed job. Datark solves the problems through the following core
> > designs:
> >
> > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > access into sequential access.
> > 2. FileSystem-like API to support writing spilled data.
> > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > cache and massive storage space.
> > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > 5. Extended fault tolerance and data replication to increase reliability
> >
> > Datark is currently adopted in the production environment at both Alibaba
> > and many other companies, serving petabytes of data per day. Beyond that,
> > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > and Synnex. Most of these users have made contributions to the project,
> > forming an active community with dozens of developers.
> >
> > The proposed initial committers are interested in joining ASF to
> reinforce
> > extensive collaboration and build a more vibrant community. We believe
> the
> > Datark project will provide tremendous value for the community if it is
> > introduced into the Apache incubator.
> >
> > I will help this project as the champion and many thanks to our four
> other
> > mentors:
> >
> > * Becket Qin (j...@apache.org)
> > * Duo Zhang (zhang...@apache.org)
> > * Lidong Dai (lidong...@apache.org)
> > * Willem Jiang (ningji...@apache.org)
> >
> > FWIW, although with different solutions, the issues Datark aims to
> resolve
> > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> noticed
> > this during the discussion phase of Uniffle incubation (when we were also
> > preparing for the incubation) and had some open and friendly discussion
> to
> > see whether there could be a joint force [4], and finally decided to
> > develop independently for the time being [5].
> >
> > Look forward to your feedback. Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1] https://github.com/alibaba/RemoteShuffleService
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > [3] https://uniffle.apache.org/
> > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> >
>
>
> --
>
>
> --
> Best Regards
>
> DolphinScheduler PMC
> Gang Li 李岗
>
> lgcar...@apache.org
>


-- 
Byung-Gon Chun


RE: [DISCUSS] Incubating Proposal for Datark

2022-09-25 Thread zhongqiang.czq
+1 (non-binding)
Good Luck !

On 2022/09/22 03:45:10 Yu Li wrote:
> Hi All,
> 
> I would like to propose Datark [1] as a new apache incubator project, and
> you can find the proposal [2] of Datark for more details.
> 
> Datark is an intermediate (shuffle and spilled) data service for big data
> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> performance, stability, and flexibility. It aims at enabling computing
> engines to fully embrace the disaggregated architecture. In a lot of cases,
> intermediate data depends on large local disks, and is often a major cause
> of inefficiency, instability, and inflexibility in the lifecycle of a
> distributed job. Datark solves the problems through the following core
> designs:
> 
> 1. Push-based shuffle plus partition data aggregation to turn random IO
> access into sequential access.
> 2. FileSystem-like API to support writing spilled data.
> 3. Hierarchical storage from memory to DFS/object store to enable fast
> cache and massive storage space.
> 4. Engine-irrelevant APIs for easy integrating to various engines.
> 5. Extended fault tolerance and data replication to increase reliability
> 
> Datark is currently adopted in the production environment at both Alibaba
> and many other companies, serving petabytes of data per day. Beyond that,
> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> and Synnex. Most of these users have made contributions to the project,
> forming an active community with dozens of developers.
> 
> The proposed initial committers are interested in joining ASF to reinforce
> extensive collaboration and build a more vibrant community. We believe the
> Datark project will provide tremendous value for the community if it is
> introduced into the Apache incubator.
> 
> I will help this project as the champion and many thanks to our four other
> mentors:
> 
> * Becket Qin (j...@apache.org)
> * Duo Zhang (zhang...@apache.org)
> * Lidong Dai (lidong...@apache.org)
> * Willem Jiang (ningji...@apache.org)
> 
> FWIW, although with different solutions, the issues Datark aims to resolve
> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> this during the discussion phase of Uniffle incubation (when we were also
> preparing for the incubation) and had some open and friendly discussion to
> see whether there could be a joint force [4], and finally decided to
> develop independently for the time being [5].
> 
> Look forward to your feedback. Thanks.
> 
> Best Regards,
> Yu
> 
> [1] https://github.com/alibaba/RemoteShuffleService
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> [3] https://uniffle.apache.org/
> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-25 Thread li gang
Glad to see this proposal, it's an interesting project,good luck.

Yu Li  于2022年9月22日周四 11:45写道:

> Hi All,
>
> I would like to propose Datark [1] as a new apache incubator project, and
> you can find the proposal [2] of Datark for more details.
>
> Datark is an intermediate (shuffle and spilled) data service for big data
> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> performance, stability, and flexibility. It aims at enabling computing
> engines to fully embrace the disaggregated architecture. In a lot of cases,
> intermediate data depends on large local disks, and is often a major cause
> of inefficiency, instability, and inflexibility in the lifecycle of a
> distributed job. Datark solves the problems through the following core
> designs:
>
> 1. Push-based shuffle plus partition data aggregation to turn random IO
> access into sequential access.
> 2. FileSystem-like API to support writing spilled data.
> 3. Hierarchical storage from memory to DFS/object store to enable fast
> cache and massive storage space.
> 4. Engine-irrelevant APIs for easy integrating to various engines.
> 5. Extended fault tolerance and data replication to increase reliability
>
> Datark is currently adopted in the production environment at both Alibaba
> and many other companies, serving petabytes of data per day. Beyond that,
> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> and Synnex. Most of these users have made contributions to the project,
> forming an active community with dozens of developers.
>
> The proposed initial committers are interested in joining ASF to reinforce
> extensive collaboration and build a more vibrant community. We believe the
> Datark project will provide tremendous value for the community if it is
> introduced into the Apache incubator.
>
> I will help this project as the champion and many thanks to our four other
> mentors:
>
> * Becket Qin (j...@apache.org)
> * Duo Zhang (zhang...@apache.org)
> * Lidong Dai (lidong...@apache.org)
> * Willem Jiang (ningji...@apache.org)
>
> FWIW, although with different solutions, the issues Datark aims to resolve
> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> this during the discussion phase of Uniffle incubation (when we were also
> preparing for the incubation) and had some open and friendly discussion to
> see whether there could be a joint force [4], and finally decided to
> develop independently for the time being [5].
>
> Look forward to your feedback. Thanks.
>
> Best Regards,
> Yu
>
> [1] https://github.com/alibaba/RemoteShuffleService
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> [3] https://uniffle.apache.org/
> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
>


-- 


--
Best Regards

DolphinScheduler PMC
Gang Li 李岗

lgcar...@apache.org


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-24 Thread Gabriel Lee
This is my +1.
Datark became a popular remote shuffle service in the past few years and
I'm very glad to see it will be one of us soon.

Have fun and enjoy it!

Best,
Gabriel

On Sun, 25 Sept 2022 at 08:45, MINX Feng  wrote:

> It is an interesting project. Good luck to Datark, may this project lives
> long and prosper.
>
> Best wishes!
> Ethan
>
> > 2022年9月22日 11:45,Yu Li  写道:
> >
> > Hi All,
> >
> > I would like to propose Datark [1] as a new apache incubator project, and
> > you can find the proposal [2] of Datark for more details.
> >
> > Datark is an intermediate (shuffle and spilled) data service for big data
> > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > performance, stability, and flexibility. It aims at enabling computing
> > engines to fully embrace the disaggregated architecture. In a lot of
> cases,
> > intermediate data depends on large local disks, and is often a major
> cause
> > of inefficiency, instability, and inflexibility in the lifecycle of a
> > distributed job. Datark solves the problems through the following core
> > designs:
> >
> > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > access into sequential access.
> > 2. FileSystem-like API to support writing spilled data.
> > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > cache and massive storage space.
> > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > 5. Extended fault tolerance and data replication to increase reliability
> >
> > Datark is currently adopted in the production environment at both Alibaba
> > and many other companies, serving petabytes of data per day. Beyond that,
> > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > and Synnex. Most of these users have made contributions to the project,
> > forming an active community with dozens of developers.
> >
> > The proposed initial committers are interested in joining ASF to
> reinforce
> > extensive collaboration and build a more vibrant community. We believe
> the
> > Datark project will provide tremendous value for the community if it is
> > introduced into the Apache incubator.
> >
> > I will help this project as the champion and many thanks to our four
> other
> > mentors:
> >
> > * Becket Qin (j...@apache.org)
> > * Duo Zhang (zhang...@apache.org)
> > * Lidong Dai (lidong...@apache.org)
> > * Willem Jiang (ningji...@apache.org)
> >
> > FWIW, although with different solutions, the issues Datark aims to
> resolve
> > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> noticed
> > this during the discussion phase of Uniffle incubation (when we were also
> > preparing for the incubation) and had some open and friendly discussion
> to
> > see whether there could be a joint force [4], and finally decided to
> > develop independently for the time being [5].
> >
> > Look forward to your feedback. Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1] https://github.com/alibaba/RemoteShuffleService
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > [3] https://uniffle.apache.org/
> > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


撤回: [DISCUSS] Incubating Proposal for Datark

2022-09-24 Thread Feng Ethan
Feng Ethan 将撤回邮件“[DISCUSS] Incubating Proposal for Datark”。

撤回: [DISCUSS] Incubating Proposal for Datark

2022-09-24 Thread Feng Ethan
Feng Ethan 将撤回邮件“[DISCUSS] Incubating Proposal for Datark”。

Re: [DISCUSS] Incubating Proposal for Datark

2022-09-24 Thread MINX Feng
+1

> 2022年9月25日 08:44,MINX Feng  写道:
> 
> It is an interesting project. Good luck to Datark, may this project lives 
> long and prosper.
> 
> Best wishes!
> Ethan
> 
>> 2022年9月22日 11:45,Yu Li  写道:
>> 
>> Hi All,
>> 
>> I would like to propose Datark [1] as a new apache incubator project, and
>> you can find the proposal [2] of Datark for more details.
>> 
>> Datark is an intermediate (shuffle and spilled) data service for big data
>> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
>> performance, stability, and flexibility. It aims at enabling computing
>> engines to fully embrace the disaggregated architecture. In a lot of cases,
>> intermediate data depends on large local disks, and is often a major cause
>> of inefficiency, instability, and inflexibility in the lifecycle of a
>> distributed job. Datark solves the problems through the following core
>> designs:
>> 
>> 1. Push-based shuffle plus partition data aggregation to turn random IO
>> access into sequential access.
>> 2. FileSystem-like API to support writing spilled data.
>> 3. Hierarchical storage from memory to DFS/object store to enable fast
>> cache and massive storage space.
>> 4. Engine-irrelevant APIs for easy integrating to various engines.
>> 5. Extended fault tolerance and data replication to increase reliability
>> 
>> Datark is currently adopted in the production environment at both Alibaba
>> and many other companies, serving petabytes of data per day. Beyond that,
>> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
>> and Synnex. Most of these users have made contributions to the project,
>> forming an active community with dozens of developers.
>> 
>> The proposed initial committers are interested in joining ASF to reinforce
>> extensive collaboration and build a more vibrant community. We believe the
>> Datark project will provide tremendous value for the community if it is
>> introduced into the Apache incubator.
>> 
>> I will help this project as the champion and many thanks to our four other
>> mentors:
>> 
>> * Becket Qin (j...@apache.org)
>> * Duo Zhang (zhang...@apache.org)
>> * Lidong Dai (lidong...@apache.org)
>> * Willem Jiang (ningji...@apache.org)
>> 
>> FWIW, although with different solutions, the issues Datark aims to resolve
>> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
>> this during the discussion phase of Uniffle incubation (when we were also
>> preparing for the incubation) and had some open and friendly discussion to
>> see whether there could be a joint force [4], and finally decided to
>> develop independently for the time being [5].
>> 
>> Look forward to your feedback. Thanks.
>> 
>> Best Regards,
>> Yu
>> 
>> [1] https://github.com/alibaba/RemoteShuffleService
>> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
>> [3] https://uniffle.apache.org/
>> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
>> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> 
> 
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-24 Thread MINX Feng
+1

> 2022年9月25日 08:44,MINX Feng  写道:
> 
> It is an interesting project. Good luck to Datark, may this project lives 
> long and prosper.
> 
> Best wishes!
> Ethan
> 
>> 2022年9月22日 11:45,Yu Li  写道:
>> 
>> Hi All,
>> 
>> I would like to propose Datark [1] as a new apache incubator project, and
>> you can find the proposal [2] of Datark for more details.
>> 
>> Datark is an intermediate (shuffle and spilled) data service for big data
>> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
>> performance, stability, and flexibility. It aims at enabling computing
>> engines to fully embrace the disaggregated architecture. In a lot of cases,
>> intermediate data depends on large local disks, and is often a major cause
>> of inefficiency, instability, and inflexibility in the lifecycle of a
>> distributed job. Datark solves the problems through the following core
>> designs:
>> 
>> 1. Push-based shuffle plus partition data aggregation to turn random IO
>> access into sequential access.
>> 2. FileSystem-like API to support writing spilled data.
>> 3. Hierarchical storage from memory to DFS/object store to enable fast
>> cache and massive storage space.
>> 4. Engine-irrelevant APIs for easy integrating to various engines.
>> 5. Extended fault tolerance and data replication to increase reliability
>> 
>> Datark is currently adopted in the production environment at both Alibaba
>> and many other companies, serving petabytes of data per day. Beyond that,
>> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
>> and Synnex. Most of these users have made contributions to the project,
>> forming an active community with dozens of developers.
>> 
>> The proposed initial committers are interested in joining ASF to reinforce
>> extensive collaboration and build a more vibrant community. We believe the
>> Datark project will provide tremendous value for the community if it is
>> introduced into the Apache incubator.
>> 
>> I will help this project as the champion and many thanks to our four other
>> mentors:
>> 
>> * Becket Qin (j...@apache.org)
>> * Duo Zhang (zhang...@apache.org)
>> * Lidong Dai (lidong...@apache.org)
>> * Willem Jiang (ningji...@apache.org)
>> 
>> FWIW, although with different solutions, the issues Datark aims to resolve
>> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
>> this during the discussion phase of Uniffle incubation (when we were also
>> preparing for the incubation) and had some open and friendly discussion to
>> see whether there could be a joint force [4], and finally decided to
>> develop independently for the time being [5].
>> 
>> Look forward to your feedback. Thanks.
>> 
>> Best Regards,
>> Yu
>> 
>> [1] https://github.com/alibaba/RemoteShuffleService
>> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
>> [3] https://uniffle.apache.org/
>> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
>> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> 
> 
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-24 Thread MINX Feng
It is an interesting project. Good luck to Datark, may this project lives long 
and prosper.

Best wishes!
Ethan

> 2022年9月22日 11:45,Yu Li  写道:
> 
> Hi All,
> 
> I would like to propose Datark [1] as a new apache incubator project, and
> you can find the proposal [2] of Datark for more details.
> 
> Datark is an intermediate (shuffle and spilled) data service for big data
> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> performance, stability, and flexibility. It aims at enabling computing
> engines to fully embrace the disaggregated architecture. In a lot of cases,
> intermediate data depends on large local disks, and is often a major cause
> of inefficiency, instability, and inflexibility in the lifecycle of a
> distributed job. Datark solves the problems through the following core
> designs:
> 
> 1. Push-based shuffle plus partition data aggregation to turn random IO
> access into sequential access.
> 2. FileSystem-like API to support writing spilled data.
> 3. Hierarchical storage from memory to DFS/object store to enable fast
> cache and massive storage space.
> 4. Engine-irrelevant APIs for easy integrating to various engines.
> 5. Extended fault tolerance and data replication to increase reliability
> 
> Datark is currently adopted in the production environment at both Alibaba
> and many other companies, serving petabytes of data per day. Beyond that,
> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> and Synnex. Most of these users have made contributions to the project,
> forming an active community with dozens of developers.
> 
> The proposed initial committers are interested in joining ASF to reinforce
> extensive collaboration and build a more vibrant community. We believe the
> Datark project will provide tremendous value for the community if it is
> introduced into the Apache incubator.
> 
> I will help this project as the champion and many thanks to our four other
> mentors:
> 
> * Becket Qin (j...@apache.org)
> * Duo Zhang (zhang...@apache.org)
> * Lidong Dai (lidong...@apache.org)
> * Willem Jiang (ningji...@apache.org)
> 
> FWIW, although with different solutions, the issues Datark aims to resolve
> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> this during the discussion phase of Uniffle incubation (when we were also
> preparing for the incubation) and had some open and friendly discussion to
> see whether there could be a joint force [4], and finally decided to
> develop independently for the time being [5].
> 
> Look forward to your feedback. Thanks.
> 
> Best Regards,
> Yu
> 
> [1] https://github.com/alibaba/RemoteShuffleService
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> [3] https://uniffle.apache.org/
> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-24 Thread david zollo
Hi guys,
Remote Shuffle Service plays an very import role in the modern big data
stack. As a mentor of this project, I'm very glad to see this project can
join the Apache Incubator.


Best Regards

---
Apache DolphinScheduler PMC Chair & Apache SeaTunnel PPMC
David
Linkedin: https://www.linkedin.com/in/davidzollo
Twitter: @WorkflowEasy 
---


On Sat, Sep 24, 2022 at 1:10 PM Benedict Jin  wrote:

> Hi,
>
> +1, it's a wonderful project, best of luck!
>
> Best Regards,
> Benedict Jin
>
> On 2022/09/23 15:11:51 Kelu Tao wrote:
> > Cool. Good Luck ~
> >
> > On 2022/09/22 03:45:10 Yu Li wrote:
> > > Hi All,
> > >
> > > I would like to propose Datark [1] as a new apache incubator project,
> and
> > > you can find the proposal [2] of Datark for more details.
> > >
> > > Datark is an intermediate (shuffle and spilled) data service for big
> data
> > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to
> boost
> > > performance, stability, and flexibility. It aims at enabling computing
> > > engines to fully embrace the disaggregated architecture. In a lot of
> cases,
> > > intermediate data depends on large local disks, and is often a major
> cause
> > > of inefficiency, instability, and inflexibility in the lifecycle of a
> > > distributed job. Datark solves the problems through the following core
> > > designs:
> > >
> > > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > > access into sequential access.
> > > 2. FileSystem-like API to support writing spilled data.
> > > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > > cache and massive storage space.
> > > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > > 5. Extended fault tolerance and data replication to increase
> reliability
> > >
> > > Datark is currently adopted in the production environment at both
> Alibaba
> > > and many other companies, serving petabytes of data per day. Beyond
> that,
> > > it has more open source users including Shopee, NetEase, Bilibily,
> BOSS,
> > > and Synnex. Most of these users have made contributions to the project,
> > > forming an active community with dozens of developers.
> > >
> > > The proposed initial committers are interested in joining ASF to
> reinforce
> > > extensive collaboration and build a more vibrant community. We believe
> the
> > > Datark project will provide tremendous value for the community if it is
> > > introduced into the Apache incubator.
> > >
> > > I will help this project as the champion and many thanks to our four
> other
> > > mentors:
> > >
> > > * Becket Qin (j...@apache.org)
> > > * Duo Zhang (zhang...@apache.org)
> > > * Lidong Dai (lidong...@apache.org)
> > > * Willem Jiang (ningji...@apache.org)
> > >
> > > FWIW, although with different solutions, the issues Datark aims to
> resolve
> > > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> noticed
> > > this during the discussion phase of Uniffle incubation (when we were
> also
> > > preparing for the incubation) and had some open and friendly
> discussion to
> > > see whether there could be a joint force [4], and finally decided to
> > > develop independently for the time being [5].
> > >
> > > Look forward to your feedback. Thanks.
> > >
> > > Best Regards,
> > > Yu
> > >
> > > [1] https://github.com/alibaba/RemoteShuffleService
> > > [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > > [3] https://uniffle.apache.org/
> > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> > >
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Benedict Jin
Hi,

+1, it's a wonderful project, best of luck!

Best Regards,
Benedict Jin

On 2022/09/23 15:11:51 Kelu Tao wrote:
> Cool. Good Luck ~
> 
> On 2022/09/22 03:45:10 Yu Li wrote:
> > Hi All,
> > 
> > I would like to propose Datark [1] as a new apache incubator project, and
> > you can find the proposal [2] of Datark for more details.
> > 
> > Datark is an intermediate (shuffle and spilled) data service for big data
> > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > performance, stability, and flexibility. It aims at enabling computing
> > engines to fully embrace the disaggregated architecture. In a lot of cases,
> > intermediate data depends on large local disks, and is often a major cause
> > of inefficiency, instability, and inflexibility in the lifecycle of a
> > distributed job. Datark solves the problems through the following core
> > designs:
> > 
> > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > access into sequential access.
> > 2. FileSystem-like API to support writing spilled data.
> > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > cache and massive storage space.
> > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > 5. Extended fault tolerance and data replication to increase reliability
> > 
> > Datark is currently adopted in the production environment at both Alibaba
> > and many other companies, serving petabytes of data per day. Beyond that,
> > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > and Synnex. Most of these users have made contributions to the project,
> > forming an active community with dozens of developers.
> > 
> > The proposed initial committers are interested in joining ASF to reinforce
> > extensive collaboration and build a more vibrant community. We believe the
> > Datark project will provide tremendous value for the community if it is
> > introduced into the Apache incubator.
> > 
> > I will help this project as the champion and many thanks to our four other
> > mentors:
> > 
> > * Becket Qin (j...@apache.org)
> > * Duo Zhang (zhang...@apache.org)
> > * Lidong Dai (lidong...@apache.org)
> > * Willem Jiang (ningji...@apache.org)
> > 
> > FWIW, although with different solutions, the issues Datark aims to resolve
> > have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> > this during the discussion phase of Uniffle incubation (when we were also
> > preparing for the incubation) and had some open and friendly discussion to
> > see whether there could be a joint force [4], and finally decided to
> > develop independently for the time being [5].
> > 
> > Look forward to your feedback. Thanks.
> > 
> > Best Regards,
> > Yu
> > 
> > [1] https://github.com/alibaba/RemoteShuffleService
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > [3] https://uniffle.apache.org/
> > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> > 
> 
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 
> 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Julian Hyde
HW-Chao,

Your message is unreadable. Can you please resend without the HTML markup?

Julian

> On Sep 23, 2022, at 7:12 AM, HW-Chao Wang <576749...@qq.com.INVALID> wrote:
> 
>  This is an interesting project, +1 on the proposal. 
>   On 2022/09/23 13:06:00 Yu Li wrote:  Thanks all for the 
> positive feedback!@Willem  The proposal is updated and both 
> the project rename plan and github ids of  core developers have been 
> added. Please check it and let us know if any  further suggestions. 
> Thanks.Best Regards,  Yu  On Fri, 23 Sept 
> 2022 at 20:41, Xiaoqiao He  wrote: This is an interesting 
> project, +1 on the proposal and good luck to Datark! Best 
> Regards,   - He Xiaoqiao On Fri, Sep 23, 2022 at 
> 7:55 PM Willem Jiangwrote:  Hi Yu,  
>  Thanks for the explanation. Please add a rename plan 
> to the projectproposal.I'd be happy to be the 
> mentor of this project.   BTW,  Could you update 
> the Core Developers information with theirgithub id,  it 
> could be easy for us to track the contributions.  
> Willem Jiang   Twitter: 
> willemjiangWeibo: 姜宁willem   On 
> Fri, Sep 23, 2022 at 5:41 PM Yu Li  wrote:
>  Hi Willem, Referring to the 
> recent incubation process of streampark [1] and   uniffle   
>   [2], it seems they didn't rename their original project names 
> before entering apache incubator, thus we didn't plan to 
> change the original github project name but would 
> redirect it to the new project afterentering
>  incubation. OTOH, if such a rename is necessary before incubation, we 
>will need some internal approval to 
> process. Thanks. Best Regards,  
>Yu [1] 
> https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3   
>   [2] 
> https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h   
>   On Fri, 23 Sept 2022 at 
> 09:07, Willem Jiang wrote:
>   I just checked the source repo, it is still using the name of  
> RemoteShuffleService.  Is there 
> any plan for when we will change the project name?  
>  On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:  
>Hi All,
>  I would like to propose Datark 
> [1] as a new apache incubatorproject, and 
>   you can find the proposal [2] of Datark for more details.   
>   Datark is an intermediate 
> (shuffle and spilled) data service for   bigdata  
>  compute engines (Apache Spark, Apache Flink, Apache 
> Hive, etc.) toboost   
> performance, stability, and flexibility. It aims at enabling
> computing   engines to fully embrace the 
> disaggregated architecture. In a lot   of  
> cases,   intermediate data depends on large local 
> disks, and is often a   major  cause  
>  of inefficiency, instability, and inflexibility in 
> the lifecycle   of a   distributed job. 
> Datark solves the problems through the followingcore  
>  designs: 
> 1. Push-based shuffle plus partition data aggregation to 
> turn   randomIO   access 
> into sequential access.   2. FileSystem-like API 
> to support writing spilled data.   3. 
> Hierarchical storage from memory to DFS/object store to enable
> fast   cache and massive storage space.   
> 4. Engine-irrelevant APIs for easy integrating to various 
> engines.   5. Extended fault tolerance and data 
> replication to increasereliability  
>Datark is currently adopted in the 
> production environment at bothAlibaba 
>   and many other companies, serving petabytes of data per day. Beyond 
>that,   it has more open source 
> users including Shopee, NetEase, Bilibily,BOSS,   
> and Synnex. Most of these users have made contributions 
> to theproject,   forming an 
> active community with dozens of developers.   
>   The proposed initial committers are interested 
> in joining ASF to  reinforce  
>  extensive collaboration and build a more vibrant community. We   
>  believe  the   
> Datark project will provide tremendous value for the community if   
> itis   introduced into the Apache 
> incubator. I will 
> help this project as the champion and many thanks to our   four  
> other   mentors:   
>   * Becket Qin 
> (j...@apache.org)   * Duo Zhang 
> (zhang...@apache.org)   * Lidong Dai 
> (lidong...@apache.org)   * Willem Jiang 
> (ningji...@apache.org)
>  FWIW, although with different solutions, the issues Datark aims to  
> resolve   have some overlap 
> with Apache Uniffle (incubating) [3]. Actually we  
> noticed   this during the discussion phase of 
> Uniffle incubation (when we   werealso
>preparing for the incubation) and had some open and friendly 
>discussion  to 
>   see whether there could be a joint force [4], and 

Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Kelu Tao
Cool. Good Luck ~

On 2022/09/22 03:45:10 Yu Li wrote:
> Hi All,
> 
> I would like to propose Datark [1] as a new apache incubator project, and
> you can find the proposal [2] of Datark for more details.
> 
> Datark is an intermediate (shuffle and spilled) data service for big data
> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> performance, stability, and flexibility. It aims at enabling computing
> engines to fully embrace the disaggregated architecture. In a lot of cases,
> intermediate data depends on large local disks, and is often a major cause
> of inefficiency, instability, and inflexibility in the lifecycle of a
> distributed job. Datark solves the problems through the following core
> designs:
> 
> 1. Push-based shuffle plus partition data aggregation to turn random IO
> access into sequential access.
> 2. FileSystem-like API to support writing spilled data.
> 3. Hierarchical storage from memory to DFS/object store to enable fast
> cache and massive storage space.
> 4. Engine-irrelevant APIs for easy integrating to various engines.
> 5. Extended fault tolerance and data replication to increase reliability
> 
> Datark is currently adopted in the production environment at both Alibaba
> and many other companies, serving petabytes of data per day. Beyond that,
> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> and Synnex. Most of these users have made contributions to the project,
> forming an active community with dozens of developers.
> 
> The proposed initial committers are interested in joining ASF to reinforce
> extensive collaboration and build a more vibrant community. We believe the
> Datark project will provide tremendous value for the community if it is
> introduced into the Apache incubator.
> 
> I will help this project as the champion and many thanks to our four other
> mentors:
> 
> * Becket Qin (j...@apache.org)
> * Duo Zhang (zhang...@apache.org)
> * Lidong Dai (lidong...@apache.org)
> * Willem Jiang (ningji...@apache.org)
> 
> FWIW, although with different solutions, the issues Datark aims to resolve
> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> this during the discussion phase of Uniffle incubation (when we were also
> preparing for the incubation) and had some open and friendly discussion to
> see whether there could be a joint force [4], and finally decided to
> develop independently for the time being [5].
> 
> Look forward to your feedback. Thanks.
> 
> Best Regards,
> Yu
> 
> [1] https://github.com/alibaba/RemoteShuffleService
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> [3] https://uniffle.apache.org/
> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Cheng Pan
+1, glad to see the community of Datark has been growing.

Thanks,
Cheng Pan

On Fri, Sep 23, 2022 at 9:55 PM Calvin Kirs  wrote:
>
> Hi,
>   Good to see this proposal, it's an interesting project.
>
> On Fri, Sep 23, 2022 at 9:29 PM 王勝傑  wrote:
> >
> > Good luck from XIAOMI
> >
> > On 2022/09/22 03:45:10 Yu Li wrote:
> > > Hi All,
> > >
> > > I would like to propose Datark [1] as a new apache incubator project, and
> > > you can find the proposal [2] of Datark for more details.
> > >
> > > Datark is an intermediate (shuffle and spilled) data service for big data
> > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > > performance, stability, and flexibility. It aims at enabling computing
> > > engines to fully embrace the disaggregated architecture. In a lot of 
> > > cases,
> > > intermediate data depends on large local disks, and is often a major cause
> > > of inefficiency, instability, and inflexibility in the lifecycle of a
> > > distributed job. Datark solves the problems through the following core
> > > designs:
> > >
> > > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > > access into sequential access.
> > > 2. FileSystem-like API to support writing spilled data.
> > > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > > cache and massive storage space.
> > > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > > 5. Extended fault tolerance and data replication to increase reliability
> > >
> > > Datark is currently adopted in the production environment at both Alibaba
> > > and many other companies, serving petabytes of data per day. Beyond that,
> > > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > > and Synnex. Most of these users have made contributions to the project,
> > > forming an active community with dozens of developers.
> > >
> > > The proposed initial committers are interested in joining ASF to reinforce
> > > extensive collaboration and build a more vibrant community. We believe the
> > > Datark project will provide tremendous value for the community if it is
> > > introduced into the Apache incubator.
> > >
> > > I will help this project as the champion and many thanks to our four other
> > > mentors:
> > >
> > > * Becket Qin (j...@apache.org)
> > > * Duo Zhang (zhang...@apache.org)
> > > * Lidong Dai (lidong...@apache.org)
> > > * Willem Jiang (ningji...@apache.org)
> > >
> > > FWIW, although with different solutions, the issues Datark aims to resolve
> > > have some overlap with Apache Uniffle (incubating) [3]. Actually we 
> > > noticed
> > > this during the discussion phase of Uniffle incubation (when we were also
> > > preparing for the incubation) and had some open and friendly discussion to
> > > see whether there could be a joint force [4], and finally decided to
> > > develop independently for the time being [5].
> > >
> > > Look forward to your feedback. Thanks.
> > >
> > > Best Regards,
> > > Yu
> > >
> > > [1] https://github.com/alibaba/RemoteShuffleService
> > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > > [3] https://uniffle.apache.org/
> > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> > >
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
>
> --
> Best wishes!
> CalvinKirs
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Calvin Kirs
Hi,
  Good to see this proposal, it's an interesting project.

On Fri, Sep 23, 2022 at 9:29 PM 王勝傑  wrote:
>
> Good luck from XIAOMI
>
> On 2022/09/22 03:45:10 Yu Li wrote:
> > Hi All,
> >
> > I would like to propose Datark [1] as a new apache incubator project, and
> > you can find the proposal [2] of Datark for more details.
> >
> > Datark is an intermediate (shuffle and spilled) data service for big data
> > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > performance, stability, and flexibility. It aims at enabling computing
> > engines to fully embrace the disaggregated architecture. In a lot of cases,
> > intermediate data depends on large local disks, and is often a major cause
> > of inefficiency, instability, and inflexibility in the lifecycle of a
> > distributed job. Datark solves the problems through the following core
> > designs:
> >
> > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > access into sequential access.
> > 2. FileSystem-like API to support writing spilled data.
> > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > cache and massive storage space.
> > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > 5. Extended fault tolerance and data replication to increase reliability
> >
> > Datark is currently adopted in the production environment at both Alibaba
> > and many other companies, serving petabytes of data per day. Beyond that,
> > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > and Synnex. Most of these users have made contributions to the project,
> > forming an active community with dozens of developers.
> >
> > The proposed initial committers are interested in joining ASF to reinforce
> > extensive collaboration and build a more vibrant community. We believe the
> > Datark project will provide tremendous value for the community if it is
> > introduced into the Apache incubator.
> >
> > I will help this project as the champion and many thanks to our four other
> > mentors:
> >
> > * Becket Qin (j...@apache.org)
> > * Duo Zhang (zhang...@apache.org)
> > * Lidong Dai (lidong...@apache.org)
> > * Willem Jiang (ningji...@apache.org)
> >
> > FWIW, although with different solutions, the issues Datark aims to resolve
> > have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> > this during the discussion phase of Uniffle incubation (when we were also
> > preparing for the incubation) and had some open and friendly discussion to
> > see whether there could be a joint force [4], and finally decided to
> > develop independently for the time being [5].
> >
> > Look forward to your feedback. Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1] https://github.com/alibaba/RemoteShuffleService
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > [3] https://uniffle.apache.org/
> > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> >
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>


-- 
Best wishes!
CalvinKirs

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



RE: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread 王勝傑
Good luck from XIAOMI

On 2022/09/22 03:45:10 Yu Li wrote:
> Hi All,
> 
> I would like to propose Datark [1] as a new apache incubator project, and
> you can find the proposal [2] of Datark for more details.
> 
> Datark is an intermediate (shuffle and spilled) data service for big data
> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> performance, stability, and flexibility. It aims at enabling computing
> engines to fully embrace the disaggregated architecture. In a lot of cases,
> intermediate data depends on large local disks, and is often a major cause
> of inefficiency, instability, and inflexibility in the lifecycle of a
> distributed job. Datark solves the problems through the following core
> designs:
> 
> 1. Push-based shuffle plus partition data aggregation to turn random IO
> access into sequential access.
> 2. FileSystem-like API to support writing spilled data.
> 3. Hierarchical storage from memory to DFS/object store to enable fast
> cache and massive storage space.
> 4. Engine-irrelevant APIs for easy integrating to various engines.
> 5. Extended fault tolerance and data replication to increase reliability
> 
> Datark is currently adopted in the production environment at both Alibaba
> and many other companies, serving petabytes of data per day. Beyond that,
> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> and Synnex. Most of these users have made contributions to the project,
> forming an active community with dozens of developers.
> 
> The proposed initial committers are interested in joining ASF to reinforce
> extensive collaboration and build a more vibrant community. We believe the
> Datark project will provide tremendous value for the community if it is
> introduced into the Apache incubator.
> 
> I will help this project as the champion and many thanks to our four other
> mentors:
> 
> * Becket Qin (j...@apache.org)
> * Duo Zhang (zhang...@apache.org)
> * Lidong Dai (lidong...@apache.org)
> * Willem Jiang (ningji...@apache.org)
> 
> FWIW, although with different solutions, the issues Datark aims to resolve
> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> this during the discussion phase of Uniffle incubation (when we were also
> preparing for the incubation) and had some open and friendly discussion to
> see whether there could be a joint force [4], and finally decided to
> develop independently for the time being [5].
> 
> Look forward to your feedback. Thanks.
> 
> Best Regards,
> Yu
> 
> [1] https://github.com/alibaba/RemoteShuffleService
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> [3] https://uniffle.apache.org/
> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Duo Zhang
Thanks Yu Li for putting this up. As the mentor of this project, I
will try my best to help the community.

Yu Li  于2022年9月23日周五 21:06写道:
>
> Thanks all for the positive feedback!
>
> @Willem
> The proposal is updated and both the project rename plan and github ids of
> core developers have been added. Please check it and let us know if any
> further suggestions. Thanks.
>
> Best Regards,
> Yu
>
>
> On Fri, 23 Sept 2022 at 20:41, Xiaoqiao He  wrote:
>
> > This is an interesting project, +1 on the proposal and good luck to Datark!
> >
> > Best Regards,
> > - He Xiaoqiao
> >
> > On Fri, Sep 23, 2022 at 7:55 PM Willem Jiang 
> > wrote:
> >
> > > Hi Yu,
> > >
> > > Thanks for the explanation. Please add a rename plan to the project
> > > proposal.
> > > I'd be happy to be the mentor of this project.
> > >
> > > BTW,  Could you update the Core Developers information with their
> > > github id,  it could be easy for us to track the contributions.
> > >
> > >
> > > Willem Jiang
> > >
> > > Twitter: willemjiang
> > > Weibo: 姜宁willem
> > >
> > > On Fri, Sep 23, 2022 at 5:41 PM Yu Li  wrote:
> > > >
> > > > Hi Willem,
> > > >
> > > > Referring to the recent incubation process of streampark [1] and
> > uniffle
> > > > [2], it seems they didn't rename their original project names before
> > > > entering apache incubator, thus we didn't plan to change the original
> > > > github project name but would redirect it to the new project after
> > > entering
> > > > incubation. OTOH, if such a rename is necessary before incubation, we
> > > will
> > > > need some internal approval to process. Thanks.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3
> > > > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h
> > > >
> > > >
> > > > On Fri, 23 Sept 2022 at 09:07, Willem Jiang 
> > > wrote:
> > > >
> > > > > I just checked the source repo, it is still using the name of
> > > > > RemoteShuffleService.
> > > > > Is there any plan for when we will change the project name?
> > > > >
> > > > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to propose Datark [1] as a new apache incubator
> > > project, and
> > > > > > you can find the proposal [2] of Datark for more details.
> > > > > >
> > > > > > Datark is an intermediate (shuffle and spilled) data service for
> > big
> > > data
> > > > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to
> > > boost
> > > > > > performance, stability, and flexibility. It aims at enabling
> > > computing
> > > > > > engines to fully embrace the disaggregated architecture. In a lot
> > of
> > > > > cases,
> > > > > > intermediate data depends on large local disks, and is often a
> > major
> > > > > cause
> > > > > > of inefficiency, instability, and inflexibility in the lifecycle
> > of a
> > > > > > distributed job. Datark solves the problems through the following
> > > core
> > > > > > designs:
> > > > > >
> > > > > > 1. Push-based shuffle plus partition data aggregation to turn
> > random
> > > IO
> > > > > > access into sequential access.
> > > > > > 2. FileSystem-like API to support writing spilled data.
> > > > > > 3. Hierarchical storage from memory to DFS/object store to enable
> > > fast
> > > > > > cache and massive storage space.
> > > > > > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > > > > > 5. Extended fault tolerance and data replication to increase
> > > reliability
> > > > > >
> > > > > > Datark is currently adopted in the production environment at both
> > > Alibaba
> > > > > > and many other companies, serving petabytes of data per day. Beyond
> > > that,
> > > > > > it has more open source users including Shopee, NetEase, Bilibily,
> > > BOSS,
> > > > > > and Synnex. Most of these users have made contributions to the
> > > project,
> > > > > > forming an active community with dozens of developers.
> > > > > >
> > > > > > The proposed initial committers are interested in joining ASF to
> > > > > reinforce
> > > > > > extensive collaboration and build a more vibrant community. We
> > > believe
> > > > > the
> > > > > > Datark project will provide tremendous value for the community if
> > it
> > > is
> > > > > > introduced into the Apache incubator.
> > > > > >
> > > > > > I will help this project as the champion and many thanks to our
> > four
> > > > > other
> > > > > > mentors:
> > > > > >
> > > > > > * Becket Qin (j...@apache.org)
> > > > > > * Duo Zhang (zhang...@apache.org)
> > > > > > * Lidong Dai (lidong...@apache.org)
> > > > > > * Willem Jiang (ningji...@apache.org)
> > > > > >
> > > > > > FWIW, although with different solutions, the issues Datark aims to
> > > > > resolve
> > > > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> > > > > noticed
> > > > > > this during the discussion phase of Uniffle incubation (when we
> > were
> > > also
> > > 

Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Yu Li
Thanks all for the positive feedback!

@Willem
The proposal is updated and both the project rename plan and github ids of
core developers have been added. Please check it and let us know if any
further suggestions. Thanks.

Best Regards,
Yu


On Fri, 23 Sept 2022 at 20:41, Xiaoqiao He  wrote:

> This is an interesting project, +1 on the proposal and good luck to Datark!
>
> Best Regards,
> - He Xiaoqiao
>
> On Fri, Sep 23, 2022 at 7:55 PM Willem Jiang 
> wrote:
>
> > Hi Yu,
> >
> > Thanks for the explanation. Please add a rename plan to the project
> > proposal.
> > I'd be happy to be the mentor of this project.
> >
> > BTW,  Could you update the Core Developers information with their
> > github id,  it could be easy for us to track the contributions.
> >
> >
> > Willem Jiang
> >
> > Twitter: willemjiang
> > Weibo: 姜宁willem
> >
> > On Fri, Sep 23, 2022 at 5:41 PM Yu Li  wrote:
> > >
> > > Hi Willem,
> > >
> > > Referring to the recent incubation process of streampark [1] and
> uniffle
> > > [2], it seems they didn't rename their original project names before
> > > entering apache incubator, thus we didn't plan to change the original
> > > github project name but would redirect it to the new project after
> > entering
> > > incubation. OTOH, if such a rename is necessary before incubation, we
> > will
> > > need some internal approval to process. Thanks.
> > >
> > > Best Regards,
> > > Yu
> > >
> > > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3
> > > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h
> > >
> > >
> > > On Fri, 23 Sept 2022 at 09:07, Willem Jiang 
> > wrote:
> > >
> > > > I just checked the source repo, it is still using the name of
> > > > RemoteShuffleService.
> > > > Is there any plan for when we will change the project name?
> > > >
> > > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to propose Datark [1] as a new apache incubator
> > project, and
> > > > > you can find the proposal [2] of Datark for more details.
> > > > >
> > > > > Datark is an intermediate (shuffle and spilled) data service for
> big
> > data
> > > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to
> > boost
> > > > > performance, stability, and flexibility. It aims at enabling
> > computing
> > > > > engines to fully embrace the disaggregated architecture. In a lot
> of
> > > > cases,
> > > > > intermediate data depends on large local disks, and is often a
> major
> > > > cause
> > > > > of inefficiency, instability, and inflexibility in the lifecycle
> of a
> > > > > distributed job. Datark solves the problems through the following
> > core
> > > > > designs:
> > > > >
> > > > > 1. Push-based shuffle plus partition data aggregation to turn
> random
> > IO
> > > > > access into sequential access.
> > > > > 2. FileSystem-like API to support writing spilled data.
> > > > > 3. Hierarchical storage from memory to DFS/object store to enable
> > fast
> > > > > cache and massive storage space.
> > > > > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > > > > 5. Extended fault tolerance and data replication to increase
> > reliability
> > > > >
> > > > > Datark is currently adopted in the production environment at both
> > Alibaba
> > > > > and many other companies, serving petabytes of data per day. Beyond
> > that,
> > > > > it has more open source users including Shopee, NetEase, Bilibily,
> > BOSS,
> > > > > and Synnex. Most of these users have made contributions to the
> > project,
> > > > > forming an active community with dozens of developers.
> > > > >
> > > > > The proposed initial committers are interested in joining ASF to
> > > > reinforce
> > > > > extensive collaboration and build a more vibrant community. We
> > believe
> > > > the
> > > > > Datark project will provide tremendous value for the community if
> it
> > is
> > > > > introduced into the Apache incubator.
> > > > >
> > > > > I will help this project as the champion and many thanks to our
> four
> > > > other
> > > > > mentors:
> > > > >
> > > > > * Becket Qin (j...@apache.org)
> > > > > * Duo Zhang (zhang...@apache.org)
> > > > > * Lidong Dai (lidong...@apache.org)
> > > > > * Willem Jiang (ningji...@apache.org)
> > > > >
> > > > > FWIW, although with different solutions, the issues Datark aims to
> > > > resolve
> > > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> > > > noticed
> > > > > this during the discussion phase of Uniffle incubation (when we
> were
> > also
> > > > > preparing for the incubation) and had some open and friendly
> > discussion
> > > > to
> > > > > see whether there could be a joint force [4], and finally decided
> to
> > > > > develop independently for the time being [5].
> > > > >
> > > > > Look forward to your feedback. Thanks.
> > > > >
> > > > > Best Regards,
> > > > > Yu
> > > > >
> > > > > [1] https://github.com/alibaba/RemoteShuffleService
> > > > 

Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Xiaoqiao He
This is an interesting project, +1 on the proposal and good luck to Datark!

Best Regards,
- He Xiaoqiao

On Fri, Sep 23, 2022 at 7:55 PM Willem Jiang  wrote:

> Hi Yu,
>
> Thanks for the explanation. Please add a rename plan to the project
> proposal.
> I'd be happy to be the mentor of this project.
>
> BTW,  Could you update the Core Developers information with their
> github id,  it could be easy for us to track the contributions.
>
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Fri, Sep 23, 2022 at 5:41 PM Yu Li  wrote:
> >
> > Hi Willem,
> >
> > Referring to the recent incubation process of streampark [1] and uniffle
> > [2], it seems they didn't rename their original project names before
> > entering apache incubator, thus we didn't plan to change the original
> > github project name but would redirect it to the new project after
> entering
> > incubation. OTOH, if such a rename is necessary before incubation, we
> will
> > need some internal approval to process. Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3
> > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h
> >
> >
> > On Fri, 23 Sept 2022 at 09:07, Willem Jiang 
> wrote:
> >
> > > I just checked the source repo, it is still using the name of
> > > RemoteShuffleService.
> > > Is there any plan for when we will change the project name?
> > >
> > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I would like to propose Datark [1] as a new apache incubator
> project, and
> > > > you can find the proposal [2] of Datark for more details.
> > > >
> > > > Datark is an intermediate (shuffle and spilled) data service for big
> data
> > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to
> boost
> > > > performance, stability, and flexibility. It aims at enabling
> computing
> > > > engines to fully embrace the disaggregated architecture. In a lot of
> > > cases,
> > > > intermediate data depends on large local disks, and is often a major
> > > cause
> > > > of inefficiency, instability, and inflexibility in the lifecycle of a
> > > > distributed job. Datark solves the problems through the following
> core
> > > > designs:
> > > >
> > > > 1. Push-based shuffle plus partition data aggregation to turn random
> IO
> > > > access into sequential access.
> > > > 2. FileSystem-like API to support writing spilled data.
> > > > 3. Hierarchical storage from memory to DFS/object store to enable
> fast
> > > > cache and massive storage space.
> > > > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > > > 5. Extended fault tolerance and data replication to increase
> reliability
> > > >
> > > > Datark is currently adopted in the production environment at both
> Alibaba
> > > > and many other companies, serving petabytes of data per day. Beyond
> that,
> > > > it has more open source users including Shopee, NetEase, Bilibily,
> BOSS,
> > > > and Synnex. Most of these users have made contributions to the
> project,
> > > > forming an active community with dozens of developers.
> > > >
> > > > The proposed initial committers are interested in joining ASF to
> > > reinforce
> > > > extensive collaboration and build a more vibrant community. We
> believe
> > > the
> > > > Datark project will provide tremendous value for the community if it
> is
> > > > introduced into the Apache incubator.
> > > >
> > > > I will help this project as the champion and many thanks to our four
> > > other
> > > > mentors:
> > > >
> > > > * Becket Qin (j...@apache.org)
> > > > * Duo Zhang (zhang...@apache.org)
> > > > * Lidong Dai (lidong...@apache.org)
> > > > * Willem Jiang (ningji...@apache.org)
> > > >
> > > > FWIW, although with different solutions, the issues Datark aims to
> > > resolve
> > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> > > noticed
> > > > this during the discussion phase of Uniffle incubation (when we were
> also
> > > > preparing for the incubation) and had some open and friendly
> discussion
> > > to
> > > > see whether there could be a joint force [4], and finally decided to
> > > > develop independently for the time being [5].
> > > >
> > > > Look forward to your feedback. Thanks.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > > [1] https://github.com/alibaba/RemoteShuffleService
> > > > [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > > > [3] https://uniffle.apache.org/
> > > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
>
> 

Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Willem Jiang
Hi Yu,

Thanks for the explanation. Please add a rename plan to the project proposal.
I'd be happy to be the mentor of this project.

BTW,  Could you update the Core Developers information with their
github id,  it could be easy for us to track the contributions.


Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Sep 23, 2022 at 5:41 PM Yu Li  wrote:
>
> Hi Willem,
>
> Referring to the recent incubation process of streampark [1] and uniffle
> [2], it seems they didn't rename their original project names before
> entering apache incubator, thus we didn't plan to change the original
> github project name but would redirect it to the new project after entering
> incubation. OTOH, if such a rename is necessary before incubation, we will
> need some internal approval to process. Thanks.
>
> Best Regards,
> Yu
>
> [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3
> [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h
>
>
> On Fri, 23 Sept 2022 at 09:07, Willem Jiang  wrote:
>
> > I just checked the source repo, it is still using the name of
> > RemoteShuffleService.
> > Is there any plan for when we will change the project name?
> >
> > On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:
> > >
> > > Hi All,
> > >
> > > I would like to propose Datark [1] as a new apache incubator project, and
> > > you can find the proposal [2] of Datark for more details.
> > >
> > > Datark is an intermediate (shuffle and spilled) data service for big data
> > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > > performance, stability, and flexibility. It aims at enabling computing
> > > engines to fully embrace the disaggregated architecture. In a lot of
> > cases,
> > > intermediate data depends on large local disks, and is often a major
> > cause
> > > of inefficiency, instability, and inflexibility in the lifecycle of a
> > > distributed job. Datark solves the problems through the following core
> > > designs:
> > >
> > > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > > access into sequential access.
> > > 2. FileSystem-like API to support writing spilled data.
> > > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > > cache and massive storage space.
> > > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > > 5. Extended fault tolerance and data replication to increase reliability
> > >
> > > Datark is currently adopted in the production environment at both Alibaba
> > > and many other companies, serving petabytes of data per day. Beyond that,
> > > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > > and Synnex. Most of these users have made contributions to the project,
> > > forming an active community with dozens of developers.
> > >
> > > The proposed initial committers are interested in joining ASF to
> > reinforce
> > > extensive collaboration and build a more vibrant community. We believe
> > the
> > > Datark project will provide tremendous value for the community if it is
> > > introduced into the Apache incubator.
> > >
> > > I will help this project as the champion and many thanks to our four
> > other
> > > mentors:
> > >
> > > * Becket Qin (j...@apache.org)
> > > * Duo Zhang (zhang...@apache.org)
> > > * Lidong Dai (lidong...@apache.org)
> > > * Willem Jiang (ningji...@apache.org)
> > >
> > > FWIW, although with different solutions, the issues Datark aims to
> > resolve
> > > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> > noticed
> > > this during the discussion phase of Uniffle incubation (when we were also
> > > preparing for the incubation) and had some open and friendly discussion
> > to
> > > see whether there could be a joint force [4], and finally decided to
> > > develop independently for the time being [5].
> > >
> > > Look forward to your feedback. Thanks.
> > >
> > > Best Regards,
> > > Yu
> > >
> > > [1] https://github.com/alibaba/RemoteShuffleService
> > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > > [3] https://uniffle.apache.org/
> > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Becket Qin
It's my pleasure to be a mentor of Datark. I'm looking forward to the
feedback on the incubation proposal.

Cheers,

Jiangjie (Becket) Qin

On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:

> Hi All,
>
> I would like to propose Datark [1] as a new apache incubator project, and
> you can find the proposal [2] of Datark for more details.
>
> Datark is an intermediate (shuffle and spilled) data service for big data
> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> performance, stability, and flexibility. It aims at enabling computing
> engines to fully embrace the disaggregated architecture. In a lot of cases,
> intermediate data depends on large local disks, and is often a major cause
> of inefficiency, instability, and inflexibility in the lifecycle of a
> distributed job. Datark solves the problems through the following core
> designs:
>
> 1. Push-based shuffle plus partition data aggregation to turn random IO
> access into sequential access.
> 2. FileSystem-like API to support writing spilled data.
> 3. Hierarchical storage from memory to DFS/object store to enable fast
> cache and massive storage space.
> 4. Engine-irrelevant APIs for easy integrating to various engines.
> 5. Extended fault tolerance and data replication to increase reliability
>
> Datark is currently adopted in the production environment at both Alibaba
> and many other companies, serving petabytes of data per day. Beyond that,
> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> and Synnex. Most of these users have made contributions to the project,
> forming an active community with dozens of developers.
>
> The proposed initial committers are interested in joining ASF to reinforce
> extensive collaboration and build a more vibrant community. We believe the
> Datark project will provide tremendous value for the community if it is
> introduced into the Apache incubator.
>
> I will help this project as the champion and many thanks to our four other
> mentors:
>
> * Becket Qin (j...@apache.org)
> * Duo Zhang (zhang...@apache.org)
> * Lidong Dai (lidong...@apache.org)
> * Willem Jiang (ningji...@apache.org)
>
> FWIW, although with different solutions, the issues Datark aims to resolve
> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> this during the discussion phase of Uniffle incubation (when we were also
> preparing for the incubation) and had some open and friendly discussion to
> see whether there could be a joint force [4], and finally decided to
> develop independently for the time being [5].
>
> Look forward to your feedback. Thanks.
>
> Best Regards,
> Yu
>
> [1] https://github.com/alibaba/RemoteShuffleService
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> [3] https://uniffle.apache.org/
> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Yu Xiao
Very happy to see this proposal, good luck~

Best wishes!

Yu Xiao
Apache ShenYu

Yu Li  于2022年9月23日周五 17:41写道:
>
> Hi Willem,
>
> Referring to the recent incubation process of streampark [1] and uniffle
> [2], it seems they didn't rename their original project names before
> entering apache incubator, thus we didn't plan to change the original
> github project name but would redirect it to the new project after entering
> incubation. OTOH, if such a rename is necessary before incubation, we will
> need some internal approval to process. Thanks.
>
> Best Regards,
> Yu
>
> [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3
> [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h
>
>
> On Fri, 23 Sept 2022 at 09:07, Willem Jiang  wrote:
>
> > I just checked the source repo, it is still using the name of
> > RemoteShuffleService.
> > Is there any plan for when we will change the project name?
> >
> > On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:
> > >
> > > Hi All,
> > >
> > > I would like to propose Datark [1] as a new apache incubator project, and
> > > you can find the proposal [2] of Datark for more details.
> > >
> > > Datark is an intermediate (shuffle and spilled) data service for big data
> > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > > performance, stability, and flexibility. It aims at enabling computing
> > > engines to fully embrace the disaggregated architecture. In a lot of
> > cases,
> > > intermediate data depends on large local disks, and is often a major
> > cause
> > > of inefficiency, instability, and inflexibility in the lifecycle of a
> > > distributed job. Datark solves the problems through the following core
> > > designs:
> > >
> > > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > > access into sequential access.
> > > 2. FileSystem-like API to support writing spilled data.
> > > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > > cache and massive storage space.
> > > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > > 5. Extended fault tolerance and data replication to increase reliability
> > >
> > > Datark is currently adopted in the production environment at both Alibaba
> > > and many other companies, serving petabytes of data per day. Beyond that,
> > > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > > and Synnex. Most of these users have made contributions to the project,
> > > forming an active community with dozens of developers.
> > >
> > > The proposed initial committers are interested in joining ASF to
> > reinforce
> > > extensive collaboration and build a more vibrant community. We believe
> > the
> > > Datark project will provide tremendous value for the community if it is
> > > introduced into the Apache incubator.
> > >
> > > I will help this project as the champion and many thanks to our four
> > other
> > > mentors:
> > >
> > > * Becket Qin (j...@apache.org)
> > > * Duo Zhang (zhang...@apache.org)
> > > * Lidong Dai (lidong...@apache.org)
> > > * Willem Jiang (ningji...@apache.org)
> > >
> > > FWIW, although with different solutions, the issues Datark aims to
> > resolve
> > > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> > noticed
> > > this during the discussion phase of Uniffle incubation (when we were also
> > > preparing for the incubation) and had some open and friendly discussion
> > to
> > > see whether there could be a joint force [4], and finally decided to
> > > develop independently for the time being [5].
> > >
> > > Look forward to your feedback. Thanks.
> > >
> > > Best Regards,
> > > Yu
> > >
> > > [1] https://github.com/alibaba/RemoteShuffleService
> > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > > [3] https://uniffle.apache.org/
> > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Datark

2022-09-23 Thread Yu Li
Hi Willem,

Referring to the recent incubation process of streampark [1] and uniffle
[2], it seems they didn't rename their original project names before
entering apache incubator, thus we didn't plan to change the original
github project name but would redirect it to the new project after entering
incubation. OTOH, if such a rename is necessary before incubation, we will
need some internal approval to process. Thanks.

Best Regards,
Yu

[1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3
[2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h


On Fri, 23 Sept 2022 at 09:07, Willem Jiang  wrote:

> I just checked the source repo, it is still using the name of
> RemoteShuffleService.
> Is there any plan for when we will change the project name?
>
> On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:
> >
> > Hi All,
> >
> > I would like to propose Datark [1] as a new apache incubator project, and
> > you can find the proposal [2] of Datark for more details.
> >
> > Datark is an intermediate (shuffle and spilled) data service for big data
> > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > performance, stability, and flexibility. It aims at enabling computing
> > engines to fully embrace the disaggregated architecture. In a lot of
> cases,
> > intermediate data depends on large local disks, and is often a major
> cause
> > of inefficiency, instability, and inflexibility in the lifecycle of a
> > distributed job. Datark solves the problems through the following core
> > designs:
> >
> > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > access into sequential access.
> > 2. FileSystem-like API to support writing spilled data.
> > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > cache and massive storage space.
> > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > 5. Extended fault tolerance and data replication to increase reliability
> >
> > Datark is currently adopted in the production environment at both Alibaba
> > and many other companies, serving petabytes of data per day. Beyond that,
> > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > and Synnex. Most of these users have made contributions to the project,
> > forming an active community with dozens of developers.
> >
> > The proposed initial committers are interested in joining ASF to
> reinforce
> > extensive collaboration and build a more vibrant community. We believe
> the
> > Datark project will provide tremendous value for the community if it is
> > introduced into the Apache incubator.
> >
> > I will help this project as the champion and many thanks to our four
> other
> > mentors:
> >
> > * Becket Qin (j...@apache.org)
> > * Duo Zhang (zhang...@apache.org)
> > * Lidong Dai (lidong...@apache.org)
> > * Willem Jiang (ningji...@apache.org)
> >
> > FWIW, although with different solutions, the issues Datark aims to
> resolve
> > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> noticed
> > this during the discussion phase of Uniffle incubation (when we were also
> > preparing for the incubation) and had some open and friendly discussion
> to
> > see whether there could be a joint force [4], and finally decided to
> > develop independently for the time being [5].
> >
> > Look forward to your feedback. Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1] https://github.com/alibaba/RemoteShuffleService
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > [3] https://uniffle.apache.org/
> > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Incubating Proposal for Datark

2022-09-22 Thread Willem Jiang
I just checked the source repo, it is still using the name of
RemoteShuffleService.
Is there any plan for when we will change the project name?

On Thu, Sep 22, 2022 at 11:45 AM Yu Li  wrote:
>
> Hi All,
>
> I would like to propose Datark [1] as a new apache incubator project, and
> you can find the proposal [2] of Datark for more details.
>
> Datark is an intermediate (shuffle and spilled) data service for big data
> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> performance, stability, and flexibility. It aims at enabling computing
> engines to fully embrace the disaggregated architecture. In a lot of cases,
> intermediate data depends on large local disks, and is often a major cause
> of inefficiency, instability, and inflexibility in the lifecycle of a
> distributed job. Datark solves the problems through the following core
> designs:
>
> 1. Push-based shuffle plus partition data aggregation to turn random IO
> access into sequential access.
> 2. FileSystem-like API to support writing spilled data.
> 3. Hierarchical storage from memory to DFS/object store to enable fast
> cache and massive storage space.
> 4. Engine-irrelevant APIs for easy integrating to various engines.
> 5. Extended fault tolerance and data replication to increase reliability
>
> Datark is currently adopted in the production environment at both Alibaba
> and many other companies, serving petabytes of data per day. Beyond that,
> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> and Synnex. Most of these users have made contributions to the project,
> forming an active community with dozens of developers.
>
> The proposed initial committers are interested in joining ASF to reinforce
> extensive collaboration and build a more vibrant community. We believe the
> Datark project will provide tremendous value for the community if it is
> introduced into the Apache incubator.
>
> I will help this project as the champion and many thanks to our four other
> mentors:
>
> * Becket Qin (j...@apache.org)
> * Duo Zhang (zhang...@apache.org)
> * Lidong Dai (lidong...@apache.org)
> * Willem Jiang (ningji...@apache.org)
>
> FWIW, although with different solutions, the issues Datark aims to resolve
> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
> this during the discussion phase of Uniffle incubation (when we were also
> preparing for the incubation) and had some open and friendly discussion to
> see whether there could be a joint force [4], and finally decided to
> develop independently for the time being [5].
>
> Look forward to your feedback. Thanks.
>
> Best Regards,
> Yu
>
> [1] https://github.com/alibaba/RemoteShuffleService
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> [3] https://uniffle.apache.org/
> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[DISCUSS] Incubating Proposal for Datark

2022-09-21 Thread Yu Li
Hi All,

I would like to propose Datark [1] as a new apache incubator project, and
you can find the proposal [2] of Datark for more details.

Datark is an intermediate (shuffle and spilled) data service for big data
compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
performance, stability, and flexibility. It aims at enabling computing
engines to fully embrace the disaggregated architecture. In a lot of cases,
intermediate data depends on large local disks, and is often a major cause
of inefficiency, instability, and inflexibility in the lifecycle of a
distributed job. Datark solves the problems through the following core
designs:

1. Push-based shuffle plus partition data aggregation to turn random IO
access into sequential access.
2. FileSystem-like API to support writing spilled data.
3. Hierarchical storage from memory to DFS/object store to enable fast
cache and massive storage space.
4. Engine-irrelevant APIs for easy integrating to various engines.
5. Extended fault tolerance and data replication to increase reliability

Datark is currently adopted in the production environment at both Alibaba
and many other companies, serving petabytes of data per day. Beyond that,
it has more open source users including Shopee, NetEase, Bilibily, BOSS,
and Synnex. Most of these users have made contributions to the project,
forming an active community with dozens of developers.

The proposed initial committers are interested in joining ASF to reinforce
extensive collaboration and build a more vibrant community. We believe the
Datark project will provide tremendous value for the community if it is
introduced into the Apache incubator.

I will help this project as the champion and many thanks to our four other
mentors:

* Becket Qin (j...@apache.org)
* Duo Zhang (zhang...@apache.org)
* Lidong Dai (lidong...@apache.org)
* Willem Jiang (ningji...@apache.org)

FWIW, although with different solutions, the issues Datark aims to resolve
have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
this during the discussion phase of Uniffle incubation (when we were also
preparing for the incubation) and had some open and friendly discussion to
see whether there could be a joint force [4], and finally decided to
develop independently for the time being [5].

Look forward to your feedback. Thanks.

Best Regards,
Yu

[1] https://github.com/alibaba/RemoteShuffleService
[2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
[3] https://uniffle.apache.org/
[4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
[5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw