Re: [DISCUSS] Incubating Proposal for Datark
HW-Chao, Your message is unreadable. Can you please resend without the HTML markup? Julian > On Sep 23, 2022, at 7:12 AM, HW-Chao Wang <576749...@qq.com.INVALID> wrote: > > This is an interesting project, +1 on the proposal. > On 2022/09/23 13:06:00 Yu Li wrote: Thanks all for the > positive feedback!@Willem The proposal is updated and both > the project rename plan and github ids of core developers have been > added. Please check it and let us know if any further suggestions. > Thanks.Best Regards, Yu On Fri, 23 Sept > 2022 at 20:41, Xiaoqiao He wrote: This is an interesting > project, +1 on the proposal and good luck to Datark! Best > Regards, - He Xiaoqiao On Fri, Sep 23, 2022 at > 7:55 PM Willem Jiangwrote: Hi Yu, > Thanks for the explanation. Please add a rename plan > to the projectproposal.I'd be happy to be the > mentor of this project. BTW, Could you update > the Core Developers information with theirgithub id, it > could be easy for us to track the contributions. > Willem Jiang Twitter: > willemjiangWeibo: 姜宁willem On > Fri, Sep 23, 2022 at 5:41 PM Yu Li wrote: > Hi Willem, Referring to the > recent incubation process of streampark [1] and uniffle > [2], it seems they didn't rename their original project names > before entering apache incubator, thus we didn't plan to > change the original github project name but would > redirect it to the new project afterentering > incubation. OTOH, if such a rename is necessary before incubation, we >will need some internal approval to > process. Thanks. Best Regards, >Yu [1] > https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3 > [2] > https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h > On Fri, 23 Sept 2022 at > 09:07, Willem Jiang wrote: > I just checked the source repo, it is still using the name of > RemoteShuffleService. Is there > any plan for when we will change the project name? > On Thu, Sep 22, 2022 at 11:45 AM Yu Li wrote: >Hi All, > I would like to propose Datark > [1] as a new apache incubatorproject, and > you can find the proposal [2] of Datark for more details. > Datark is an intermediate > (shuffle and spilled) data service for bigdata > compute engines (Apache Spark, Apache Flink, Apache > Hive, etc.) toboost > performance, stability, and flexibility. It aims at enabling > computing engines to fully embrace the > disaggregated architecture. In a lot of > cases, intermediate data depends on large local > disks, and is often a major cause > of inefficiency, instability, and inflexibility in > the lifecycle of a distributed job. > Datark solves the problems through the followingcore > designs: > 1. Push-based shuffle plus partition data aggregation to > turn randomIO access > into sequential access. 2. FileSystem-like API > to support writing spilled data. 3. > Hierarchical storage from memory to DFS/object store to enable > fast cache and massive storage space. > 4. Engine-irrelevant APIs for easy integrating to various > engines. 5. Extended fault tolerance and data > replication to increasereliability >Datark is currently adopted in the > production environment at bothAlibaba > and many other companies, serving petabytes of data per day. Beyond >that, it has more open source > users including Shopee, NetEase, Bilibily,BOSS, > and Synnex. Most of these users have made contributions > to theproject, forming an > active community with dozens of developers. > The proposed initial committers are interested > in joining ASF to reinforce > extensive collaboration and build a more vibrant community. We > believe the > Datark project will provide tremendous value for the community if > itis introduced into the Apache > incubator. I will > help this project as the champion and many thanks to our four > other mentors: > * Becket Qin > (j...@apache.org) * Duo Zhang > (zhang...@apache.org) * Lidong Dai > (lidong...@apache.org) * Willem Jiang > (ningji...@apache.org) > FWIW, although with different solutions, the issues Datark aims to > resolve have some overlap > with Apache Uniffle (incubating) [3]. Actually we > noticed this during the discussion phase of > Uniffle incubation (when we werealso >preparing for the incubation) and had some open and friendly >discussion to > see whether there could be a joint force [4], and
Re: [VOTE] Release Apache NuttX (Incubating) 11.0.0
+1 binding Checked sigs and sums: Matched LICENSE and NOTICE: Looks fine DISCLAIMER: In place Built from src: Built a sim:nsh board with the ci docker image ghcr.io/apache/incubator-nuttx/apache-nuttx-ci-linux:latest Run basic cmds: Tried cd, hello, echo, etc, and finally poweroff, all fine Nathan Hartman 于2022年9月23日周五 22:09写道: > > On Thu, Sep 22, 2022 at 10:57 PM Alin Jerpelea wrote: > > > > Hello all, > > > > This is a call for a vote to release Apache NuttX (Incubating) version > > 11.0.0. > > (snip) > > > Apache NuttX community vote and result thread: > > Result:https://lists.apache.org/thread/c0lv34jvwbdh6wfnl7xsprxmggllpmcw > > Vote:https://lists.apache.org/thread/rv9pf8gtbcq4zjnydv2h2lsymszzb33f > > > > SCM Information: > > Release tag: nuttx-11.0.0-RC2 > > Hash for the release incubating-nuttx > > tag:d32555f3e0492b8f4caeb407db55de23322724ef > > Hash for the release incubating-nuttx-apps > > tag:8b43f9f9ca30f44c1cccae9a9078d5d45b776d35 > > > > > > [1] https://dist.apache.org/repos/dist/dev/incubator/nuttx/11.0.0-RC2/ > > [2]https://raw.githubusercontent.com/apache/incubator-nuttx/nuttx-11.0.0-RC2/ReleaseNotes > > [3] https://www.apache.org/dev/release.html#approving-a-release > > [4]https://cwiki.apache.org/confluence/display/NUTTX/Validating+a+staged+Release > > Hi all, > > Pleased to submit my vote: > > Summary: > +1 to release (binding) > > Per Alan's request for size information [1]: > > * NuttX-11.0.0-RC2, b-g474e-dpow1:nsh configuration: > > $ arm-none-eabi-size nuttx >textdata bss dec hex filename > 107623 6722012 110307 1aee3 nuttx > > * This is a big improvement over NuttX-10.3.0-RC4, same config: > > $ arm-none-eabi-size nuttx >textdata bss dec hex filename > 117843 6362256 120735 1d79f nuttx > > Text is reduced by 10220!! Data increases slightly by 36, but bss > decreases by 244. Great work everyone! > > Please note that for future releases, I would like to build in 'ostest' > for additional verification. Today I built and tested such a > configuration on this board for this release. Soon I will open a PR to > upstream this new config as b-g474e-dpow1:ostest. > > Compared to b-g474e-dpow1:nsh, it adds the following configs: > > +CONFIG_BUILTIN=y > +CONFIG_NSH_BUILTIN_APPS=y > +CONFIG_TESTING_OSTEST=y > > and, because ostest currently fails when priority inheritance is > enabled, it removes the following configs: > > -CONFIG_PRIORITY_INHERITANCE=y > -CONFIG_PTHREAD_MUTEX_DEFAULT_PRIO_INHERIT=y > > Here is the size information for that build: > > * NuttX-11.0.0-RC2, b-g474e-dpow1:ostest configuration (not yet > upstreamed): > > $ arm-none-eabi-size nuttx >textdata bss dec hex filename > 179819 6684812 185299 2d3d3 nuttx > > Additional notes about this build: > > 1. I noticed that to be able to run 'ostest' from NSH the configuration >needs CONFIG_BUILTIN and CONFIG_NSH_BUILTIN_APPS. Do we want to make >CONFIG_TESTING_OSTEST depend on those? > > 2. I noticed that the 'openocd' incantation in the README.txt of >b-g474e-dpow1 is incorrect, at least on my system, even though I >wrote that file! I will open a PR soon to fix that. The incantation >that worked for me now is: > >$ sudo openocd -f interface/stlink.cfg -f target/stm32g4x.cfg -c > init -c "reset halt" -c "flash write_image erase nuttx.bin 0x0800" > > Neither of the above issues is a showstopper. > > Development system: Linux > (Debian 4.19.0-21-rt-amd64 x86_64) > > Verified: > * Signatures > * SHA-512 sums > * Incubating in artifact names > * LICENSE, NOTICE, DISCLAIMER, and README.md present in both tarballs > * DISCLAIMER-WIP has been renamed to DISCLAIMER as licenses have been > migrated to Apache 2.0 or documented in LICENSE; hopefully we will > graduate soon and be able to remove DISCLAIMER altogether :-) > * Build, FLASH program, and boot b-g474e-dpow1:nsh to the NSH prompt > successfully. > * Build, FLASH, boot b-g474e-dpow1:ostest (not yet upstreamed as > discussed above) and ran 'ostest' successfully. > > Dependencies: > * gcc-arm-none-eabi-7-2017-q4-major > * kconfig-conf from NuttX tools repository > > Other dependencies from Debian packages: > * binutils-dev 2.31.1-16 > * bison 2:3.3.2.dfsg-1 > * flex 2.6.4-6.2 > * gperf 3.1-1 > * libelf-dev 0.176-1.1 > * libgmp-dev 2:6.1.2+dfsg-4 > * libisl-dev 0.20-2 > * libmpc-dev 1.1.0-1 > * libmpfr-dev 4.0.2-1 > * libncurses5-dev 6.1+20181013-2+deb10u2 > * libusb-1.0-0-dev 2:1.0.22-2 > * libusb-dev 2:0.1.12-32 > * openocd 0.10.0-5 > * texinfo 6.5.0.dfsg.1-4+b1 > > A very big **THANK YOU** to our RM and to everyone in the Apache NuttX > and Incubator community for making this release (candidate) possible! > > References: > > [1] Alan Carvalho de Assis's message to the dev@nuttx.a.o thread "Re: > [VOTE] Apache NuttX 10.0.0 (incubating) RC0 release" on 26 Nov 2020, > archived: > https://lists.apache.org/thread/nxvwxol948psr2z7fc6cwtdv9ofoz9yj >
Re: [DISCUSS] Incubating Proposal for Datark
Hi, +1, it's a wonderful project, best of luck! Best Regards, Benedict Jin On 2022/09/23 15:11:51 Kelu Tao wrote: > Cool. Good Luck ~ > > On 2022/09/22 03:45:10 Yu Li wrote: > > Hi All, > > > > I would like to propose Datark [1] as a new apache incubator project, and > > you can find the proposal [2] of Datark for more details. > > > > Datark is an intermediate (shuffle and spilled) data service for big data > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > > performance, stability, and flexibility. It aims at enabling computing > > engines to fully embrace the disaggregated architecture. In a lot of cases, > > intermediate data depends on large local disks, and is often a major cause > > of inefficiency, instability, and inflexibility in the lifecycle of a > > distributed job. Datark solves the problems through the following core > > designs: > > > > 1. Push-based shuffle plus partition data aggregation to turn random IO > > access into sequential access. > > 2. FileSystem-like API to support writing spilled data. > > 3. Hierarchical storage from memory to DFS/object store to enable fast > > cache and massive storage space. > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > 5. Extended fault tolerance and data replication to increase reliability > > > > Datark is currently adopted in the production environment at both Alibaba > > and many other companies, serving petabytes of data per day. Beyond that, > > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > > and Synnex. Most of these users have made contributions to the project, > > forming an active community with dozens of developers. > > > > The proposed initial committers are interested in joining ASF to reinforce > > extensive collaboration and build a more vibrant community. We believe the > > Datark project will provide tremendous value for the community if it is > > introduced into the Apache incubator. > > > > I will help this project as the champion and many thanks to our four other > > mentors: > > > > * Becket Qin (j...@apache.org) > > * Duo Zhang (zhang...@apache.org) > > * Lidong Dai (lidong...@apache.org) > > * Willem Jiang (ningji...@apache.org) > > > > FWIW, although with different solutions, the issues Datark aims to resolve > > have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed > > this during the discussion phase of Uniffle incubation (when we were also > > preparing for the incubation) and had some open and friendly discussion to > > see whether there could be a joint force [4], and finally decided to > > develop independently for the time being [5]. > > > > Look forward to your feedback. Thanks. > > > > Best Regards, > > Yu > > > > [1] https://github.com/alibaba/RemoteShuffleService > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > [3] https://uniffle.apache.org/ > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Incubating Proposal for Datark
Hi Willem, Referring to the recent incubation process of streampark [1] and uniffle [2], it seems they didn't rename their original project names before entering apache incubator, thus we didn't plan to change the original github project name but would redirect it to the new project after entering incubation. OTOH, if such a rename is necessary before incubation, we will need some internal approval to process. Thanks. Best Regards, Yu [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3 [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h On Fri, 23 Sept 2022 at 09:07, Willem Jiang wrote: > I just checked the source repo, it is still using the name of > RemoteShuffleService. > Is there any plan for when we will change the project name? > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li wrote: > > > > Hi All, > > > > I would like to propose Datark [1] as a new apache incubator project, and > > you can find the proposal [2] of Datark for more details. > > > > Datark is an intermediate (shuffle and spilled) data service for big data > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > > performance, stability, and flexibility. It aims at enabling computing > > engines to fully embrace the disaggregated architecture. In a lot of > cases, > > intermediate data depends on large local disks, and is often a major > cause > > of inefficiency, instability, and inflexibility in the lifecycle of a > > distributed job. Datark solves the problems through the following core > > designs: > > > > 1. Push-based shuffle plus partition data aggregation to turn random IO > > access into sequential access. > > 2. FileSystem-like API to support writing spilled data. > > 3. Hierarchical storage from memory to DFS/object store to enable fast > > cache and massive storage space. > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > 5. Extended fault tolerance and data replication to increase reliability > > > > Datark is currently adopted in the production environment at both Alibaba > > and many other companies, serving petabytes of data per day. Beyond that, > > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > > and Synnex. Most of these users have made contributions to the project, > > forming an active community with dozens of developers. > > > > The proposed initial committers are interested in joining ASF to > reinforce > > extensive collaboration and build a more vibrant community. We believe > the > > Datark project will provide tremendous value for the community if it is > > introduced into the Apache incubator. > > > > I will help this project as the champion and many thanks to our four > other > > mentors: > > > > * Becket Qin (j...@apache.org) > > * Duo Zhang (zhang...@apache.org) > > * Lidong Dai (lidong...@apache.org) > > * Willem Jiang (ningji...@apache.org) > > > > FWIW, although with different solutions, the issues Datark aims to > resolve > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > noticed > > this during the discussion phase of Uniffle incubation (when we were also > > preparing for the incubation) and had some open and friendly discussion > to > > see whether there could be a joint force [4], and finally decided to > > develop independently for the time being [5]. > > > > Look forward to your feedback. Thanks. > > > > Best Regards, > > Yu > > > > [1] https://github.com/alibaba/RemoteShuffleService > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > [3] https://uniffle.apache.org/ > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [DISCUSS] Incubating Proposal for Datark
Very happy to see this proposal, good luck~ Best wishes! Yu Xiao Apache ShenYu Yu Li 于2022年9月23日周五 17:41写道: > > Hi Willem, > > Referring to the recent incubation process of streampark [1] and uniffle > [2], it seems they didn't rename their original project names before > entering apache incubator, thus we didn't plan to change the original > github project name but would redirect it to the new project after entering > incubation. OTOH, if such a rename is necessary before incubation, we will > need some internal approval to process. Thanks. > > Best Regards, > Yu > > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3 > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h > > > On Fri, 23 Sept 2022 at 09:07, Willem Jiang wrote: > > > I just checked the source repo, it is still using the name of > > RemoteShuffleService. > > Is there any plan for when we will change the project name? > > > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li wrote: > > > > > > Hi All, > > > > > > I would like to propose Datark [1] as a new apache incubator project, and > > > you can find the proposal [2] of Datark for more details. > > > > > > Datark is an intermediate (shuffle and spilled) data service for big data > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > > > performance, stability, and flexibility. It aims at enabling computing > > > engines to fully embrace the disaggregated architecture. In a lot of > > cases, > > > intermediate data depends on large local disks, and is often a major > > cause > > > of inefficiency, instability, and inflexibility in the lifecycle of a > > > distributed job. Datark solves the problems through the following core > > > designs: > > > > > > 1. Push-based shuffle plus partition data aggregation to turn random IO > > > access into sequential access. > > > 2. FileSystem-like API to support writing spilled data. > > > 3. Hierarchical storage from memory to DFS/object store to enable fast > > > cache and massive storage space. > > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > > 5. Extended fault tolerance and data replication to increase reliability > > > > > > Datark is currently adopted in the production environment at both Alibaba > > > and many other companies, serving petabytes of data per day. Beyond that, > > > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > > > and Synnex. Most of these users have made contributions to the project, > > > forming an active community with dozens of developers. > > > > > > The proposed initial committers are interested in joining ASF to > > reinforce > > > extensive collaboration and build a more vibrant community. We believe > > the > > > Datark project will provide tremendous value for the community if it is > > > introduced into the Apache incubator. > > > > > > I will help this project as the champion and many thanks to our four > > other > > > mentors: > > > > > > * Becket Qin (j...@apache.org) > > > * Duo Zhang (zhang...@apache.org) > > > * Lidong Dai (lidong...@apache.org) > > > * Willem Jiang (ningji...@apache.org) > > > > > > FWIW, although with different solutions, the issues Datark aims to > > resolve > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > > noticed > > > this during the discussion phase of Uniffle incubation (when we were also > > > preparing for the incubation) and had some open and friendly discussion > > to > > > see whether there could be a joint force [4], and finally decided to > > > develop independently for the time being [5]. > > > > > > Look forward to your feedback. Thanks. > > > > > > Best Regards, > > > Yu > > > > > > [1] https://github.com/alibaba/RemoteShuffleService > > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > > [3] https://uniffle.apache.org/ > > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > > - > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Incubating Proposal for Datark
It's my pleasure to be a mentor of Datark. I'm looking forward to the feedback on the incubation proposal. Cheers, Jiangjie (Becket) Qin On Thu, Sep 22, 2022 at 11:45 AM Yu Li wrote: > Hi All, > > I would like to propose Datark [1] as a new apache incubator project, and > you can find the proposal [2] of Datark for more details. > > Datark is an intermediate (shuffle and spilled) data service for big data > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > performance, stability, and flexibility. It aims at enabling computing > engines to fully embrace the disaggregated architecture. In a lot of cases, > intermediate data depends on large local disks, and is often a major cause > of inefficiency, instability, and inflexibility in the lifecycle of a > distributed job. Datark solves the problems through the following core > designs: > > 1. Push-based shuffle plus partition data aggregation to turn random IO > access into sequential access. > 2. FileSystem-like API to support writing spilled data. > 3. Hierarchical storage from memory to DFS/object store to enable fast > cache and massive storage space. > 4. Engine-irrelevant APIs for easy integrating to various engines. > 5. Extended fault tolerance and data replication to increase reliability > > Datark is currently adopted in the production environment at both Alibaba > and many other companies, serving petabytes of data per day. Beyond that, > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > and Synnex. Most of these users have made contributions to the project, > forming an active community with dozens of developers. > > The proposed initial committers are interested in joining ASF to reinforce > extensive collaboration and build a more vibrant community. We believe the > Datark project will provide tremendous value for the community if it is > introduced into the Apache incubator. > > I will help this project as the champion and many thanks to our four other > mentors: > > * Becket Qin (j...@apache.org) > * Duo Zhang (zhang...@apache.org) > * Lidong Dai (lidong...@apache.org) > * Willem Jiang (ningji...@apache.org) > > FWIW, although with different solutions, the issues Datark aims to resolve > have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed > this during the discussion phase of Uniffle incubation (when we were also > preparing for the incubation) and had some open and friendly discussion to > see whether there could be a joint force [4], and finally decided to > develop independently for the time being [5]. > > Look forward to your feedback. Thanks. > > Best Regards, > Yu > > [1] https://github.com/alibaba/RemoteShuffleService > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > [3] https://uniffle.apache.org/ > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw >
Re: [DISCUSS] Incubating Proposal for Datark
Hi Yu, Thanks for the explanation. Please add a rename plan to the project proposal. I'd be happy to be the mentor of this project. BTW, Could you update the Core Developers information with their github id, it could be easy for us to track the contributions. Willem Jiang Twitter: willemjiang Weibo: 姜宁willem On Fri, Sep 23, 2022 at 5:41 PM Yu Li wrote: > > Hi Willem, > > Referring to the recent incubation process of streampark [1] and uniffle > [2], it seems they didn't rename their original project names before > entering apache incubator, thus we didn't plan to change the original > github project name but would redirect it to the new project after entering > incubation. OTOH, if such a rename is necessary before incubation, we will > need some internal approval to process. Thanks. > > Best Regards, > Yu > > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3 > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h > > > On Fri, 23 Sept 2022 at 09:07, Willem Jiang wrote: > > > I just checked the source repo, it is still using the name of > > RemoteShuffleService. > > Is there any plan for when we will change the project name? > > > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li wrote: > > > > > > Hi All, > > > > > > I would like to propose Datark [1] as a new apache incubator project, and > > > you can find the proposal [2] of Datark for more details. > > > > > > Datark is an intermediate (shuffle and spilled) data service for big data > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > > > performance, stability, and flexibility. It aims at enabling computing > > > engines to fully embrace the disaggregated architecture. In a lot of > > cases, > > > intermediate data depends on large local disks, and is often a major > > cause > > > of inefficiency, instability, and inflexibility in the lifecycle of a > > > distributed job. Datark solves the problems through the following core > > > designs: > > > > > > 1. Push-based shuffle plus partition data aggregation to turn random IO > > > access into sequential access. > > > 2. FileSystem-like API to support writing spilled data. > > > 3. Hierarchical storage from memory to DFS/object store to enable fast > > > cache and massive storage space. > > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > > 5. Extended fault tolerance and data replication to increase reliability > > > > > > Datark is currently adopted in the production environment at both Alibaba > > > and many other companies, serving petabytes of data per day. Beyond that, > > > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > > > and Synnex. Most of these users have made contributions to the project, > > > forming an active community with dozens of developers. > > > > > > The proposed initial committers are interested in joining ASF to > > reinforce > > > extensive collaboration and build a more vibrant community. We believe > > the > > > Datark project will provide tremendous value for the community if it is > > > introduced into the Apache incubator. > > > > > > I will help this project as the champion and many thanks to our four > > other > > > mentors: > > > > > > * Becket Qin (j...@apache.org) > > > * Duo Zhang (zhang...@apache.org) > > > * Lidong Dai (lidong...@apache.org) > > > * Willem Jiang (ningji...@apache.org) > > > > > > FWIW, although with different solutions, the issues Datark aims to > > resolve > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > > noticed > > > this during the discussion phase of Uniffle incubation (when we were also > > > preparing for the incubation) and had some open and friendly discussion > > to > > > see whether there could be a joint force [4], and finally decided to > > > develop independently for the time being [5]. > > > > > > Look forward to your feedback. Thanks. > > > > > > Best Regards, > > > Yu > > > > > > [1] https://github.com/alibaba/RemoteShuffleService > > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > > [3] https://uniffle.apache.org/ > > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > > - > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: [DISCUSS] Incubating Proposal for Datark
Good luck from XIAOMI On 2022/09/22 03:45:10 Yu Li wrote: > Hi All, > > I would like to propose Datark [1] as a new apache incubator project, and > you can find the proposal [2] of Datark for more details. > > Datark is an intermediate (shuffle and spilled) data service for big data > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > performance, stability, and flexibility. It aims at enabling computing > engines to fully embrace the disaggregated architecture. In a lot of cases, > intermediate data depends on large local disks, and is often a major cause > of inefficiency, instability, and inflexibility in the lifecycle of a > distributed job. Datark solves the problems through the following core > designs: > > 1. Push-based shuffle plus partition data aggregation to turn random IO > access into sequential access. > 2. FileSystem-like API to support writing spilled data. > 3. Hierarchical storage from memory to DFS/object store to enable fast > cache and massive storage space. > 4. Engine-irrelevant APIs for easy integrating to various engines. > 5. Extended fault tolerance and data replication to increase reliability > > Datark is currently adopted in the production environment at both Alibaba > and many other companies, serving petabytes of data per day. Beyond that, > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > and Synnex. Most of these users have made contributions to the project, > forming an active community with dozens of developers. > > The proposed initial committers are interested in joining ASF to reinforce > extensive collaboration and build a more vibrant community. We believe the > Datark project will provide tremendous value for the community if it is > introduced into the Apache incubator. > > I will help this project as the champion and many thanks to our four other > mentors: > > * Becket Qin (j...@apache.org) > * Duo Zhang (zhang...@apache.org) > * Lidong Dai (lidong...@apache.org) > * Willem Jiang (ningji...@apache.org) > > FWIW, although with different solutions, the issues Datark aims to resolve > have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed > this during the discussion phase of Uniffle incubation (when we were also > preparing for the incubation) and had some open and friendly discussion to > see whether there could be a joint force [4], and finally decided to > develop independently for the time being [5]. > > Look forward to your feedback. Thanks. > > Best Regards, > Yu > > [1] https://github.com/alibaba/RemoteShuffleService > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > [3] https://uniffle.apache.org/ > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [Discuss] Baremaps Proposal
Thank you all for the positive feedbacks. Following Calvin's comment, I contacted more contributors and added Léonard Besseau to the list of committers. He would be interested to contribute to the project in his free time. > On 20 Sep 2022, at 13:53, Bertil Chapuis wrote: > > Hello Calvin, > >> On 20 Sep 2022, at 05:44, Calvin Kirs wrote: >> >> Hi, >> >> It seems like an interesting project, if you need mentor, you can count me >> in. > > With pleasure, I added you to the list of mentors. > >> I have one question: I quick check to see [1], >> >> It seems that you are the only one of the initial committers who has a >> commit in 2022. > > Yes, this year, a colleague and I spent a lot of time on the basemap [1]. As > we progressed toward creating a full-featured map style, we encountered > limitations with the JSON configuration files and experimented with other > approaches (YAML, etc.). Eventually, we moved to javascript, which allows us > to split the configuration into several files and to manage complex styling > directives more easily (comments, variables, functions, etc.). The basemap > has now been merged into the main repository with a single commit [2]. My > goal is now to stabilise this feature and to focus more on building the > community. > > [1] https://github.com/baremaps/openstreetmap-vecto > [2] > https://github.com/baremaps/baremaps/commit/17be0caba7378943adea984d6aebfbb94e71aabd > >> The top four contributors in [1] Their total LOC looks good. If you >> think they have a good understanding of the project, I suggest adding >> them as well. > > This is a good idea. I will contact them and check if they are interested in > joining the project as commiters. > > Best regards, > > Bertil > >> [1]https://github.com/baremaps/baremaps/graphs/contributors?from=2022-01-02=2022-09-20=c >> >> Nathan Hartman 于2022年9月16日周五 22:56写道: >>> >>> On Fri, Sep 16, 2022 at 9:25 AM Bertil Chapuis wrote: Hello Everyone, Here is a proposal for Baremaps, a toolkit and a set of infrastructure components for creating, publishing, and operating online maps. The proposal is published on the wiki [1] and a demonstration of our base map is available online [2]. The key challenge for Baremaps will be to create a sustainable community. We appreciate anyone who would be willing to support us with bug reports, code, mentoring or feedback. [1] - https://cwiki.apache.org/confluence/display/INCUBATOR/Baremaps+Proposal [2] - https://demo.baremaps.com/ Best regards, Bertil Chapuis >>> >>> >>> This is very cool! >>> >>> Thank you for your very candid proposal, and demonstration site. Your >>> assessment of licensing concerns in particular and interest in joining >>> the ASF specifically because of the high bar for bringing in 3rd party >>> code is, I think, a good fit here. >>> >>> Although I am inundated with other work at the moment and will >>> probably not be able to contribute to this project, I do think that a >>> move to ASF should benefit Baremaps and the ASF in the long run, and I >>> would like to offer my words of encouragement. >>> >>> Cheers, >>> Nathan >>> >>> - >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>> For additional commands, e-mail: general-h...@incubator.apache.org >>> >> >> >> -- >> Best wishes! >> CalvinKirs >> >> - >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[HELP WANTED] Creating the next board report
Hi, As noted here [2] the board report should not be the sole responsibility of the chair to put together. Here's what is roughly needed for each board report. After the board meeting: - Update podlings.xml to mark podling that did not report, ones whose report was missing but reported and podlings at the end of their monthly reporting period - assign podling shepherds using assign_shepherds.py - run clutch script (./clutch2.sh) and generate report template(clutch2report.py) - Update wiki report page to have correct podling listed (may need to remove graduated and retired podlings and add new podlings) - Add any new PPMCs to the wiki so they can edit their report - Send out email containing list of podlings to report and dates During the month: - Send out 2 or 3 reminders using report_reminders.py Before the report is due: - Write a summary of what has happened on the mailing list with anything of interest - List all releases made during the month / check if incubator votes match with what’s in the release area - List who been added/removed to IPMC during the month - Check that PPMC members have been added correctly - Remove the non-reporting podlings sections from the report and note them as not reporting - Note any podlings that didn’t get mentor sign off - Reflow the report and fix the report formatting including remove text prompts for mentor activity and branding questions - Lock the wiki page so it can’t be edited - Send email to this list for review - Submit the report to the board If anyone would like to do any of these tasks just ask. If you need more detail on how to do them just ask. The report run book report_reminders.py and these pages [1][2] can also help. If no one volunteers then I’ll do them. I’ll also check the report’s format and details if we do get volunteers. One thing it might be useful to look into is to see if we can automate sending out the reminders each month rather than having it as a manual task. To help manage who does what I’ve created a page here [3] just put your name next to something if you want to do it. If you need help with anything just ask. 1. https://cwiki.apache.org/confluence/display/INCUBATOR 2. https://incubator.apache.org/guides/chair.html#board_report 3. https://cwiki.apache.org/confluence/display/INCUBATOR/October+2022+Incubator+Report+Tasks -- Best wishes! CalvinKirs - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Incubating Proposal for Datark
Hi, Good to see this proposal, it's an interesting project. On Fri, Sep 23, 2022 at 9:29 PM 王勝傑 wrote: > > Good luck from XIAOMI > > On 2022/09/22 03:45:10 Yu Li wrote: > > Hi All, > > > > I would like to propose Datark [1] as a new apache incubator project, and > > you can find the proposal [2] of Datark for more details. > > > > Datark is an intermediate (shuffle and spilled) data service for big data > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > > performance, stability, and flexibility. It aims at enabling computing > > engines to fully embrace the disaggregated architecture. In a lot of cases, > > intermediate data depends on large local disks, and is often a major cause > > of inefficiency, instability, and inflexibility in the lifecycle of a > > distributed job. Datark solves the problems through the following core > > designs: > > > > 1. Push-based shuffle plus partition data aggregation to turn random IO > > access into sequential access. > > 2. FileSystem-like API to support writing spilled data. > > 3. Hierarchical storage from memory to DFS/object store to enable fast > > cache and massive storage space. > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > 5. Extended fault tolerance and data replication to increase reliability > > > > Datark is currently adopted in the production environment at both Alibaba > > and many other companies, serving petabytes of data per day. Beyond that, > > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > > and Synnex. Most of these users have made contributions to the project, > > forming an active community with dozens of developers. > > > > The proposed initial committers are interested in joining ASF to reinforce > > extensive collaboration and build a more vibrant community. We believe the > > Datark project will provide tremendous value for the community if it is > > introduced into the Apache incubator. > > > > I will help this project as the champion and many thanks to our four other > > mentors: > > > > * Becket Qin (j...@apache.org) > > * Duo Zhang (zhang...@apache.org) > > * Lidong Dai (lidong...@apache.org) > > * Willem Jiang (ningji...@apache.org) > > > > FWIW, although with different solutions, the issues Datark aims to resolve > > have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed > > this during the discussion phase of Uniffle incubation (when we were also > > preparing for the incubation) and had some open and friendly discussion to > > see whether there could be a joint force [4], and finally decided to > > develop independently for the time being [5]. > > > > Look forward to your feedback. Thanks. > > > > Best Regards, > > Yu > > > > [1] https://github.com/alibaba/RemoteShuffleService > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > [3] https://uniffle.apache.org/ > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > -- Best wishes! CalvinKirs - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [Discuss] Baremaps Proposal
On Fri, Sep 23, 2022 at 9:35 PM Bertil Chapuis wrote: > > Thank you all for the positive feedbacks. > > Following Calvin's comment, I contacted more contributors and added Léonard > Besseau to the list of committers. He would be interested to contribute to > the project in his free time. Thank you:) +1 from me and looking forward other people to the feedback on the incubation proposal. > > > On 20 Sep 2022, at 13:53, Bertil Chapuis wrote: > > > > Hello Calvin, > > > >> On 20 Sep 2022, at 05:44, Calvin Kirs wrote: > >> > >> Hi, > >> > >> It seems like an interesting project, if you need mentor, you can count me > >> in. > > > > With pleasure, I added you to the list of mentors. > > > >> I have one question: I quick check to see [1], > >> > >> It seems that you are the only one of the initial committers who has a > >> commit in 2022. > > > > Yes, this year, a colleague and I spent a lot of time on the basemap [1]. > > As we progressed toward creating a full-featured map style, we encountered > > limitations with the JSON configuration files and experimented with other > > approaches (YAML, etc.). Eventually, we moved to javascript, which allows > > us to split the configuration into several files and to manage complex > > styling directives more easily (comments, variables, functions, etc.). The > > basemap has now been merged into the main repository with a single commit > > [2]. My goal is now to stabilise this feature and to focus more on building > > the community. > > > > [1] https://github.com/baremaps/openstreetmap-vecto > > [2] > > https://github.com/baremaps/baremaps/commit/17be0caba7378943adea984d6aebfbb94e71aabd > > > >> The top four contributors in [1] Their total LOC looks good. If you > >> think they have a good understanding of the project, I suggest adding > >> them as well. > > > > This is a good idea. I will contact them and check if they are interested > > in joining the project as commiters. > > > > Best regards, > > > > Bertil > > > >> [1]https://github.com/baremaps/baremaps/graphs/contributors?from=2022-01-02=2022-09-20=c > >> > >> Nathan Hartman 于2022年9月16日周五 22:56写道: > >>> > >>> On Fri, Sep 16, 2022 at 9:25 AM Bertil Chapuis wrote: > > Hello Everyone, > > Here is a proposal for Baremaps, a toolkit and a set of infrastructure > components for creating, publishing, and operating online maps. The > proposal is published on the wiki [1] and a demonstration of our base > map is available online [2]. > > The key challenge for Baremaps will be to create a sustainable > community. We appreciate anyone who would be willing to support us with > bug reports, code, mentoring or feedback. > > [1] - > https://cwiki.apache.org/confluence/display/INCUBATOR/Baremaps+Proposal > [2] - https://demo.baremaps.com/ > > Best regards, > > Bertil Chapuis > >>> > >>> > >>> This is very cool! > >>> > >>> Thank you for your very candid proposal, and demonstration site. Your > >>> assessment of licensing concerns in particular and interest in joining > >>> the ASF specifically because of the high bar for bringing in 3rd party > >>> code is, I think, a good fit here. > >>> > >>> Although I am inundated with other work at the moment and will > >>> probably not be able to contribute to this project, I do think that a > >>> move to ASF should benefit Baremaps and the ASF in the long run, and I > >>> would like to offer my words of encouragement. > >>> > >>> Cheers, > >>> Nathan > >>> > >>> - > >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >>> For additional commands, e-mail: general-h...@incubator.apache.org > >>> > >> > >> > >> -- > >> Best wishes! > >> CalvinKirs > >> > >> - > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > >> For additional commands, e-mail: general-h...@incubator.apache.org > >> > > > > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > -- Best wishes! CalvinKirs - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Release Apache NuttX (Incubating) 11.0.0
On Thu, Sep 22, 2022 at 10:57 PM Alin Jerpelea wrote: > > Hello all, > > This is a call for a vote to release Apache NuttX (Incubating) version > 11.0.0. (snip) > Apache NuttX community vote and result thread: > Result:https://lists.apache.org/thread/c0lv34jvwbdh6wfnl7xsprxmggllpmcw > Vote:https://lists.apache.org/thread/rv9pf8gtbcq4zjnydv2h2lsymszzb33f > > SCM Information: > Release tag: nuttx-11.0.0-RC2 > Hash for the release incubating-nuttx > tag:d32555f3e0492b8f4caeb407db55de23322724ef > Hash for the release incubating-nuttx-apps > tag:8b43f9f9ca30f44c1cccae9a9078d5d45b776d35 > > > [1] https://dist.apache.org/repos/dist/dev/incubator/nuttx/11.0.0-RC2/ > [2]https://raw.githubusercontent.com/apache/incubator-nuttx/nuttx-11.0.0-RC2/ReleaseNotes > [3] https://www.apache.org/dev/release.html#approving-a-release > [4]https://cwiki.apache.org/confluence/display/NUTTX/Validating+a+staged+Release Hi all, Pleased to submit my vote: Summary: +1 to release (binding) Per Alan's request for size information [1]: * NuttX-11.0.0-RC2, b-g474e-dpow1:nsh configuration: $ arm-none-eabi-size nuttx textdata bss dec hex filename 107623 6722012 110307 1aee3 nuttx * This is a big improvement over NuttX-10.3.0-RC4, same config: $ arm-none-eabi-size nuttx textdata bss dec hex filename 117843 6362256 120735 1d79f nuttx Text is reduced by 10220!! Data increases slightly by 36, but bss decreases by 244. Great work everyone! Please note that for future releases, I would like to build in 'ostest' for additional verification. Today I built and tested such a configuration on this board for this release. Soon I will open a PR to upstream this new config as b-g474e-dpow1:ostest. Compared to b-g474e-dpow1:nsh, it adds the following configs: +CONFIG_BUILTIN=y +CONFIG_NSH_BUILTIN_APPS=y +CONFIG_TESTING_OSTEST=y and, because ostest currently fails when priority inheritance is enabled, it removes the following configs: -CONFIG_PRIORITY_INHERITANCE=y -CONFIG_PTHREAD_MUTEX_DEFAULT_PRIO_INHERIT=y Here is the size information for that build: * NuttX-11.0.0-RC2, b-g474e-dpow1:ostest configuration (not yet upstreamed): $ arm-none-eabi-size nuttx textdata bss dec hex filename 179819 6684812 185299 2d3d3 nuttx Additional notes about this build: 1. I noticed that to be able to run 'ostest' from NSH the configuration needs CONFIG_BUILTIN and CONFIG_NSH_BUILTIN_APPS. Do we want to make CONFIG_TESTING_OSTEST depend on those? 2. I noticed that the 'openocd' incantation in the README.txt of b-g474e-dpow1 is incorrect, at least on my system, even though I wrote that file! I will open a PR soon to fix that. The incantation that worked for me now is: $ sudo openocd -f interface/stlink.cfg -f target/stm32g4x.cfg -c init -c "reset halt" -c "flash write_image erase nuttx.bin 0x0800" Neither of the above issues is a showstopper. Development system: Linux (Debian 4.19.0-21-rt-amd64 x86_64) Verified: * Signatures * SHA-512 sums * Incubating in artifact names * LICENSE, NOTICE, DISCLAIMER, and README.md present in both tarballs * DISCLAIMER-WIP has been renamed to DISCLAIMER as licenses have been migrated to Apache 2.0 or documented in LICENSE; hopefully we will graduate soon and be able to remove DISCLAIMER altogether :-) * Build, FLASH program, and boot b-g474e-dpow1:nsh to the NSH prompt successfully. * Build, FLASH, boot b-g474e-dpow1:ostest (not yet upstreamed as discussed above) and ran 'ostest' successfully. Dependencies: * gcc-arm-none-eabi-7-2017-q4-major * kconfig-conf from NuttX tools repository Other dependencies from Debian packages: * binutils-dev 2.31.1-16 * bison 2:3.3.2.dfsg-1 * flex 2.6.4-6.2 * gperf 3.1-1 * libelf-dev 0.176-1.1 * libgmp-dev 2:6.1.2+dfsg-4 * libisl-dev 0.20-2 * libmpc-dev 1.1.0-1 * libmpfr-dev 4.0.2-1 * libncurses5-dev 6.1+20181013-2+deb10u2 * libusb-1.0-0-dev 2:1.0.22-2 * libusb-dev 2:0.1.12-32 * openocd 0.10.0-5 * texinfo 6.5.0.dfsg.1-4+b1 A very big **THANK YOU** to our RM and to everyone in the Apache NuttX and Incubator community for making this release (candidate) possible! References: [1] Alan Carvalho de Assis's message to the dev@nuttx.a.o thread "Re: [VOTE] Apache NuttX 10.0.0 (incubating) RC0 release" on 26 Nov 2020, archived: https://lists.apache.org/thread/nxvwxol948psr2z7fc6cwtdv9ofoz9yj Cheers, Nathan - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Incubating Proposal for Datark
+1, glad to see the community of Datark has been growing. Thanks, Cheng Pan On Fri, Sep 23, 2022 at 9:55 PM Calvin Kirs wrote: > > Hi, > Good to see this proposal, it's an interesting project. > > On Fri, Sep 23, 2022 at 9:29 PM 王勝傑 wrote: > > > > Good luck from XIAOMI > > > > On 2022/09/22 03:45:10 Yu Li wrote: > > > Hi All, > > > > > > I would like to propose Datark [1] as a new apache incubator project, and > > > you can find the proposal [2] of Datark for more details. > > > > > > Datark is an intermediate (shuffle and spilled) data service for big data > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > > > performance, stability, and flexibility. It aims at enabling computing > > > engines to fully embrace the disaggregated architecture. In a lot of > > > cases, > > > intermediate data depends on large local disks, and is often a major cause > > > of inefficiency, instability, and inflexibility in the lifecycle of a > > > distributed job. Datark solves the problems through the following core > > > designs: > > > > > > 1. Push-based shuffle plus partition data aggregation to turn random IO > > > access into sequential access. > > > 2. FileSystem-like API to support writing spilled data. > > > 3. Hierarchical storage from memory to DFS/object store to enable fast > > > cache and massive storage space. > > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > > 5. Extended fault tolerance and data replication to increase reliability > > > > > > Datark is currently adopted in the production environment at both Alibaba > > > and many other companies, serving petabytes of data per day. Beyond that, > > > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > > > and Synnex. Most of these users have made contributions to the project, > > > forming an active community with dozens of developers. > > > > > > The proposed initial committers are interested in joining ASF to reinforce > > > extensive collaboration and build a more vibrant community. We believe the > > > Datark project will provide tremendous value for the community if it is > > > introduced into the Apache incubator. > > > > > > I will help this project as the champion and many thanks to our four other > > > mentors: > > > > > > * Becket Qin (j...@apache.org) > > > * Duo Zhang (zhang...@apache.org) > > > * Lidong Dai (lidong...@apache.org) > > > * Willem Jiang (ningji...@apache.org) > > > > > > FWIW, although with different solutions, the issues Datark aims to resolve > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > > > noticed > > > this during the discussion phase of Uniffle incubation (when we were also > > > preparing for the incubation) and had some open and friendly discussion to > > > see whether there could be a joint force [4], and finally decided to > > > develop independently for the time being [5]. > > > > > > Look forward to your feedback. Thanks. > > > > > > Best Regards, > > > Yu > > > > > > [1] https://github.com/alibaba/RemoteShuffleService > > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > > [3] https://uniffle.apache.org/ > > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > > > > > - > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > -- > Best wishes! > CalvinKirs > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] Incubating Proposal for Datark
This is an interesting project, +1 on the proposal and good luck to Datark! Best Regards, - He Xiaoqiao On Fri, Sep 23, 2022 at 7:55 PM Willem Jiang wrote: > Hi Yu, > > Thanks for the explanation. Please add a rename plan to the project > proposal. > I'd be happy to be the mentor of this project. > > BTW, Could you update the Core Developers information with their > github id, it could be easy for us to track the contributions. > > > Willem Jiang > > Twitter: willemjiang > Weibo: 姜宁willem > > On Fri, Sep 23, 2022 at 5:41 PM Yu Li wrote: > > > > Hi Willem, > > > > Referring to the recent incubation process of streampark [1] and uniffle > > [2], it seems they didn't rename their original project names before > > entering apache incubator, thus we didn't plan to change the original > > github project name but would redirect it to the new project after > entering > > incubation. OTOH, if such a rename is necessary before incubation, we > will > > need some internal approval to process. Thanks. > > > > Best Regards, > > Yu > > > > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3 > > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h > > > > > > On Fri, 23 Sept 2022 at 09:07, Willem Jiang > wrote: > > > > > I just checked the source repo, it is still using the name of > > > RemoteShuffleService. > > > Is there any plan for when we will change the project name? > > > > > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li wrote: > > > > > > > > Hi All, > > > > > > > > I would like to propose Datark [1] as a new apache incubator > project, and > > > > you can find the proposal [2] of Datark for more details. > > > > > > > > Datark is an intermediate (shuffle and spilled) data service for big > data > > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to > boost > > > > performance, stability, and flexibility. It aims at enabling > computing > > > > engines to fully embrace the disaggregated architecture. In a lot of > > > cases, > > > > intermediate data depends on large local disks, and is often a major > > > cause > > > > of inefficiency, instability, and inflexibility in the lifecycle of a > > > > distributed job. Datark solves the problems through the following > core > > > > designs: > > > > > > > > 1. Push-based shuffle plus partition data aggregation to turn random > IO > > > > access into sequential access. > > > > 2. FileSystem-like API to support writing spilled data. > > > > 3. Hierarchical storage from memory to DFS/object store to enable > fast > > > > cache and massive storage space. > > > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > > > 5. Extended fault tolerance and data replication to increase > reliability > > > > > > > > Datark is currently adopted in the production environment at both > Alibaba > > > > and many other companies, serving petabytes of data per day. Beyond > that, > > > > it has more open source users including Shopee, NetEase, Bilibily, > BOSS, > > > > and Synnex. Most of these users have made contributions to the > project, > > > > forming an active community with dozens of developers. > > > > > > > > The proposed initial committers are interested in joining ASF to > > > reinforce > > > > extensive collaboration and build a more vibrant community. We > believe > > > the > > > > Datark project will provide tremendous value for the community if it > is > > > > introduced into the Apache incubator. > > > > > > > > I will help this project as the champion and many thanks to our four > > > other > > > > mentors: > > > > > > > > * Becket Qin (j...@apache.org) > > > > * Duo Zhang (zhang...@apache.org) > > > > * Lidong Dai (lidong...@apache.org) > > > > * Willem Jiang (ningji...@apache.org) > > > > > > > > FWIW, although with different solutions, the issues Datark aims to > > > resolve > > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > > > noticed > > > > this during the discussion phase of Uniffle incubation (when we were > also > > > > preparing for the incubation) and had some open and friendly > discussion > > > to > > > > see whether there could be a joint force [4], and finally decided to > > > > develop independently for the time being [5]. > > > > > > > > Look forward to your feedback. Thanks. > > > > > > > > Best Regards, > > > > Yu > > > > > > > > [1] https://github.com/alibaba/RemoteShuffleService > > > > [2] > https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > > > [3] https://uniffle.apache.org/ > > > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > > > > - > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > > > > >
Re: [DISCUSS] Incubating Proposal for Datark
Thanks all for the positive feedback! @Willem The proposal is updated and both the project rename plan and github ids of core developers have been added. Please check it and let us know if any further suggestions. Thanks. Best Regards, Yu On Fri, 23 Sept 2022 at 20:41, Xiaoqiao He wrote: > This is an interesting project, +1 on the proposal and good luck to Datark! > > Best Regards, > - He Xiaoqiao > > On Fri, Sep 23, 2022 at 7:55 PM Willem Jiang > wrote: > > > Hi Yu, > > > > Thanks for the explanation. Please add a rename plan to the project > > proposal. > > I'd be happy to be the mentor of this project. > > > > BTW, Could you update the Core Developers information with their > > github id, it could be easy for us to track the contributions. > > > > > > Willem Jiang > > > > Twitter: willemjiang > > Weibo: 姜宁willem > > > > On Fri, Sep 23, 2022 at 5:41 PM Yu Li wrote: > > > > > > Hi Willem, > > > > > > Referring to the recent incubation process of streampark [1] and > uniffle > > > [2], it seems they didn't rename their original project names before > > > entering apache incubator, thus we didn't plan to change the original > > > github project name but would redirect it to the new project after > > entering > > > incubation. OTOH, if such a rename is necessary before incubation, we > > will > > > need some internal approval to process. Thanks. > > > > > > Best Regards, > > > Yu > > > > > > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3 > > > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h > > > > > > > > > On Fri, 23 Sept 2022 at 09:07, Willem Jiang > > wrote: > > > > > > > I just checked the source repo, it is still using the name of > > > > RemoteShuffleService. > > > > Is there any plan for when we will change the project name? > > > > > > > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li wrote: > > > > > > > > > > Hi All, > > > > > > > > > > I would like to propose Datark [1] as a new apache incubator > > project, and > > > > > you can find the proposal [2] of Datark for more details. > > > > > > > > > > Datark is an intermediate (shuffle and spilled) data service for > big > > data > > > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to > > boost > > > > > performance, stability, and flexibility. It aims at enabling > > computing > > > > > engines to fully embrace the disaggregated architecture. In a lot > of > > > > cases, > > > > > intermediate data depends on large local disks, and is often a > major > > > > cause > > > > > of inefficiency, instability, and inflexibility in the lifecycle > of a > > > > > distributed job. Datark solves the problems through the following > > core > > > > > designs: > > > > > > > > > > 1. Push-based shuffle plus partition data aggregation to turn > random > > IO > > > > > access into sequential access. > > > > > 2. FileSystem-like API to support writing spilled data. > > > > > 3. Hierarchical storage from memory to DFS/object store to enable > > fast > > > > > cache and massive storage space. > > > > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > > > > 5. Extended fault tolerance and data replication to increase > > reliability > > > > > > > > > > Datark is currently adopted in the production environment at both > > Alibaba > > > > > and many other companies, serving petabytes of data per day. Beyond > > that, > > > > > it has more open source users including Shopee, NetEase, Bilibily, > > BOSS, > > > > > and Synnex. Most of these users have made contributions to the > > project, > > > > > forming an active community with dozens of developers. > > > > > > > > > > The proposed initial committers are interested in joining ASF to > > > > reinforce > > > > > extensive collaboration and build a more vibrant community. We > > believe > > > > the > > > > > Datark project will provide tremendous value for the community if > it > > is > > > > > introduced into the Apache incubator. > > > > > > > > > > I will help this project as the champion and many thanks to our > four > > > > other > > > > > mentors: > > > > > > > > > > * Becket Qin (j...@apache.org) > > > > > * Duo Zhang (zhang...@apache.org) > > > > > * Lidong Dai (lidong...@apache.org) > > > > > * Willem Jiang (ningji...@apache.org) > > > > > > > > > > FWIW, although with different solutions, the issues Datark aims to > > > > resolve > > > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > > > > noticed > > > > > this during the discussion phase of Uniffle incubation (when we > were > > also > > > > > preparing for the incubation) and had some open and friendly > > discussion > > > > to > > > > > see whether there could be a joint force [4], and finally decided > to > > > > > develop independently for the time being [5]. > > > > > > > > > > Look forward to your feedback. Thanks. > > > > > > > > > > Best Regards, > > > > > Yu > > > > > > > > > > [1] https://github.com/alibaba/RemoteShuffleService > > > >
Re: [DISCUSS] Incubating Proposal for Datark
Thanks Yu Li for putting this up. As the mentor of this project, I will try my best to help the community. Yu Li 于2022年9月23日周五 21:06写道: > > Thanks all for the positive feedback! > > @Willem > The proposal is updated and both the project rename plan and github ids of > core developers have been added. Please check it and let us know if any > further suggestions. Thanks. > > Best Regards, > Yu > > > On Fri, 23 Sept 2022 at 20:41, Xiaoqiao He wrote: > > > This is an interesting project, +1 on the proposal and good luck to Datark! > > > > Best Regards, > > - He Xiaoqiao > > > > On Fri, Sep 23, 2022 at 7:55 PM Willem Jiang > > wrote: > > > > > Hi Yu, > > > > > > Thanks for the explanation. Please add a rename plan to the project > > > proposal. > > > I'd be happy to be the mentor of this project. > > > > > > BTW, Could you update the Core Developers information with their > > > github id, it could be easy for us to track the contributions. > > > > > > > > > Willem Jiang > > > > > > Twitter: willemjiang > > > Weibo: 姜宁willem > > > > > > On Fri, Sep 23, 2022 at 5:41 PM Yu Li wrote: > > > > > > > > Hi Willem, > > > > > > > > Referring to the recent incubation process of streampark [1] and > > uniffle > > > > [2], it seems they didn't rename their original project names before > > > > entering apache incubator, thus we didn't plan to change the original > > > > github project name but would redirect it to the new project after > > > entering > > > > incubation. OTOH, if such a rename is necessary before incubation, we > > > will > > > > need some internal approval to process. Thanks. > > > > > > > > Best Regards, > > > > Yu > > > > > > > > [1] https://lists.apache.org/thread/ns5n6ozl1mdvdbhmkfol67lt163m74v3 > > > > [2] https://lists.apache.org/thread/fyyhkjvhzl4hpzr52hd64csh5lt2wm6h > > > > > > > > > > > > On Fri, 23 Sept 2022 at 09:07, Willem Jiang > > > wrote: > > > > > > > > > I just checked the source repo, it is still using the name of > > > > > RemoteShuffleService. > > > > > Is there any plan for when we will change the project name? > > > > > > > > > > On Thu, Sep 22, 2022 at 11:45 AM Yu Li wrote: > > > > > > > > > > > > Hi All, > > > > > > > > > > > > I would like to propose Datark [1] as a new apache incubator > > > project, and > > > > > > you can find the proposal [2] of Datark for more details. > > > > > > > > > > > > Datark is an intermediate (shuffle and spilled) data service for > > big > > > data > > > > > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to > > > boost > > > > > > performance, stability, and flexibility. It aims at enabling > > > computing > > > > > > engines to fully embrace the disaggregated architecture. In a lot > > of > > > > > cases, > > > > > > intermediate data depends on large local disks, and is often a > > major > > > > > cause > > > > > > of inefficiency, instability, and inflexibility in the lifecycle > > of a > > > > > > distributed job. Datark solves the problems through the following > > > core > > > > > > designs: > > > > > > > > > > > > 1. Push-based shuffle plus partition data aggregation to turn > > random > > > IO > > > > > > access into sequential access. > > > > > > 2. FileSystem-like API to support writing spilled data. > > > > > > 3. Hierarchical storage from memory to DFS/object store to enable > > > fast > > > > > > cache and massive storage space. > > > > > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > > > > > 5. Extended fault tolerance and data replication to increase > > > reliability > > > > > > > > > > > > Datark is currently adopted in the production environment at both > > > Alibaba > > > > > > and many other companies, serving petabytes of data per day. Beyond > > > that, > > > > > > it has more open source users including Shopee, NetEase, Bilibily, > > > BOSS, > > > > > > and Synnex. Most of these users have made contributions to the > > > project, > > > > > > forming an active community with dozens of developers. > > > > > > > > > > > > The proposed initial committers are interested in joining ASF to > > > > > reinforce > > > > > > extensive collaboration and build a more vibrant community. We > > > believe > > > > > the > > > > > > Datark project will provide tremendous value for the community if > > it > > > is > > > > > > introduced into the Apache incubator. > > > > > > > > > > > > I will help this project as the champion and many thanks to our > > four > > > > > other > > > > > > mentors: > > > > > > > > > > > > * Becket Qin (j...@apache.org) > > > > > > * Duo Zhang (zhang...@apache.org) > > > > > > * Lidong Dai (lidong...@apache.org) > > > > > > * Willem Jiang (ningji...@apache.org) > > > > > > > > > > > > FWIW, although with different solutions, the issues Datark aims to > > > > > resolve > > > > > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > > > > > noticed > > > > > > this during the discussion phase of Uniffle incubation (when we > > were > > > also > > >
Re: [DISCUSS] Incubating Proposal for Datark
Cool. Good Luck ~ On 2022/09/22 03:45:10 Yu Li wrote: > Hi All, > > I would like to propose Datark [1] as a new apache incubator project, and > you can find the proposal [2] of Datark for more details. > > Datark is an intermediate (shuffle and spilled) data service for big data > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > performance, stability, and flexibility. It aims at enabling computing > engines to fully embrace the disaggregated architecture. In a lot of cases, > intermediate data depends on large local disks, and is often a major cause > of inefficiency, instability, and inflexibility in the lifecycle of a > distributed job. Datark solves the problems through the following core > designs: > > 1. Push-based shuffle plus partition data aggregation to turn random IO > access into sequential access. > 2. FileSystem-like API to support writing spilled data. > 3. Hierarchical storage from memory to DFS/object store to enable fast > cache and massive storage space. > 4. Engine-irrelevant APIs for easy integrating to various engines. > 5. Extended fault tolerance and data replication to increase reliability > > Datark is currently adopted in the production environment at both Alibaba > and many other companies, serving petabytes of data per day. Beyond that, > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > and Synnex. Most of these users have made contributions to the project, > forming an active community with dozens of developers. > > The proposed initial committers are interested in joining ASF to reinforce > extensive collaboration and build a more vibrant community. We believe the > Datark project will provide tremendous value for the community if it is > introduced into the Apache incubator. > > I will help this project as the champion and many thanks to our four other > mentors: > > * Becket Qin (j...@apache.org) > * Duo Zhang (zhang...@apache.org) > * Lidong Dai (lidong...@apache.org) > * Willem Jiang (ningji...@apache.org) > > FWIW, although with different solutions, the issues Datark aims to resolve > have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed > this during the discussion phase of Uniffle incubation (when we were also > preparing for the incubation) and had some open and friendly discussion to > see whether there could be a joint force [4], and finally decided to > develop independently for the time being [5]. > > Look forward to your feedback. Thanks. > > Best Regards, > Yu > > [1] https://github.com/alibaba/RemoteShuffleService > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > [3] https://uniffle.apache.org/ > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org