Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Jungtaek Lim
+1 (non-binding)

Good luck!

On Wed, May 25, 2022 at 2:42 PM Daniel Widdis  wrote:

> This was stated in the other thread: Unified/Universal Shuffle
>
> On 5/24/22, 10:04 PM, "XiaoYu"  wrote:
>
> Hi
>
> Uniffle  as a project name, What does he mean~
>
> thanks
>
> Weiwei Yang  于2022年5月25日周三 12:57写道:
> >
> > +1 (binding)
> > Good luck!
> >
> > On Tue, May 24, 2022 at 8:49 PM Ye Xianjin 
> wrote:
> >
> > > +1 (non-binding).
> > >
> > > Sent from my iPhone
> > >
> > > > On May 25, 2022, at 9:59 AM, Goson zhang 
> wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Good luck!
> > > >
> > > > Daniel Widdis  于2022年5月25日周三 09:53写道:
> > > >
> > > >> +1 (non-binding) from me!  Good luck!
> > > >>
> > > >> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
> > > >>
> > > >>Hi all,
> > > >>
> > > >>Due to the name issue in thread (
> > > >>
> https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> > > we
> > > >>figured out a new project name "Uniffle" and created a new
> Thread.
> > > >> Please
> > > >>help to discuss.
> > > >>
> > > >>We would like to propose Uniffle[1] as a new Apache incubator
> > > project,
> > > >> you
> > > >>can find the proposal here [2] for more details.
> > > >>
> > > >>Uniffle is a high performance, general purpose Remote
> Shuffle Service
> > > >> for
> > > >>distributed compute engines like Apache Spark
> > > >>, Apache
> > > >>Hadoop MapReduce , Apache Flink
> > > >> and so on. We are aiming to make
> > > >> Firestorm a
> > > >>universal shuffle service for distributed compute engines.
> > > >>
> > > >>Shuffle is the key part for a distributed compute engine to
> exchange
> > > >> the
> > > >>data between distributed tasks, the performance and
> stability of
> > > >> shuffle
> > > >>will directly affect the whole job. Current “local file
> pull-like
> > > >> shuffle
> > > >>style” has several limitations:
> > > >>
> > > >>   1. Current shuffle is hard to support super large
> workloads,
> > > >> especially
> > > >>   in a high load environment, the major problem is IO
> problem
> > > (random
> > > >> disk IO
> > > >>   issue, network congestion and timeout).
> > > >>   2. Current shuffle is hard to deploy on the disaggregated
> compute
> > > >>   storage environment, as disk capacity is quite limited on
> compute
> > > >> nodes.
> > > >>   3. The constraint of storing shuffle data locally makes
> it hard to
> > > >> scale
> > > >>   elastically.
> > > >>
> > > >>Remote Shuffle Service is the key technology for enterprises
> to build
> > > >> big
> > > >>data platforms, to expand big data applications to
> disaggregated,
> > > >>online-offline hybrid environments, and to solve above
> problems.
> > > >>
> > > >>The implementation of Remote Shuffle Service -  “Uniffle”  -
> is
> > > heavily
> > > >>adopted in Tencent, and shows its advantages in production.
> Other
> > > >>enterprises also adopted or prepared to adopt Firestorm in
> their
> > > >>environments.
> > > >>
> > > >>Uniffle's key idea is brought from Salfish shuffle
> > > >><
> > > >>
> > >
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > > >>> ,
> > > >>it has several key design goals:
> > > >>
> > > >>   1. High performance. Firestorm’s performance is close
> enough to
> > > >> local
> > > >>   file based shuffle style for small workloads. For large
> workloads,
> > > >> it is
> > > >>   far better than the current shuffle style.
> > > >>   2. Fault tolerance. Firestorm provides high availability
> for
> > > >> Coordinated
> > > >>   nodes, and failover for Shuffle nodes.
> > > >>   3. Pluggable. Firestorm is highly pluggable, which could
> be suited
> > > >> to
> > > >>   different compute engines, different backend storages, and
> > > different
> > > >>   wire-protocols.
> > > >>
> > > >>We believe that Uniffle project will provide the great value
> for the
> > > >>community if it is accepted by the Apache incubator.
> > > >>
> > > >>I will help this project as champion and many thanks to the 3
> > > mentors:
> > > >>
> > > >>   -
> > > >>
> > > >>   Felix Cheung (felixche...@apache.org)
> > > >>   - Junping du (junping...@apache.org)
> > > >>   - Weiwei Yang (w...@apache.org)
> > > >>   - Xun liu (liu...@apache.org)
> > > >>  

Re: Looking for a champion: resurrect log4j 1.x

2021-12-20 Thread Jungtaek Lim
Just wondering, is it even fulfilling the criteria of incubation? Have
there been any similar cases before?

It was stated that there will be no effort on active development but focus
only on CVE fixes. This sounds to me as the project will start as only
fixing a few known CVEs and stop till other CVEs are discovered (there may
be huge difference between proactively discovering CVEs vs passively fixing
the reported CVEs by others), and never be attempted to become TLP.
Majority of status reports will be blank. That said, it doesn't seem that
sustainability is proven.


On Mon, Dec 20, 2021 at 10:51 PM John D. Ament 
wrote:

> On Mon, Dec 20, 2021 at 8:42 AM Romain Manni-Bucau 
> wrote:
>
> > Guess there are 4 options:
> >
> > 1. resurrect log4j1 and let it die again
> > 2. do a log4j1 release for the CVE under logging umbrella (as a
> subproject)
> > - after all log4j1 belongs to logging as a subproject already (
> > https://logging.apache.org/dormant.html)
> > 3. the log4j1-log4j2 bridge (but agree this is not a solution and
> requires
> > to do 2 to be useful technically since none of log4j1 users will want to
> > import log4j2, at least cause it is not compatible with java version or
> due
> > to the injected bytecode like module-info)
> > 4. do a CVE fix release fork on github or any other hosting
> >
> > Personally I don't think 1 or 3 are real options, 4 is but not that nice
> > indeed (due to the fact it would be yet another forks but also cause it
> > requires some GAV change or build hack to be done properly) so from my
> > window I would be tempted to push for 2 which sounds like a quick win for
> > everyone.
> >
>
> Questions like this probably should be on one of the logging lists rather
> than the incubator list.  The Incubator would not create a hostile fork
> under any circumstance, including of an existing project/sub-project within
> Apache.  In a situation like this it would be purely a call by the Logging
> PMC, whether or not they want the Incubator to create the podling.
>
>
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github <
> > https://github.com/rmannibucau> |
> > LinkedIn  | Book
> > <
> >
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> > >
> >
> >
> > Le lun. 20 déc. 2021 à 14:32, John D. Ament  a
> > écrit :
> >
> > > Hi Vladimir,
> > >
> > > I think based on what you're describing and the Logging PMC's response,
> > > re-incubating the project makes sense.  I would be curious if the
> Logging
> > > PMC would be interested in restarting the sub-project after a
> successful
> > > incubation period.  This seems to match what Ralph is suggesting as
> well.
> > >
> > > Typically this would mean that the VP Logging PMC would serve as the
> > > champion, and as the sponsor the Logging PMC would still be the one to
> > vote
> > > to add the project to the incubator.  If the VP Logging isn't
> interested
> > in
> > > doing this, I would recommend starting out the project as a standalone
> > > podling and keeping the Incubator as sponsor rather than Logging.  See
> > [1]
> > > for some details on those notes.  The incubator would be responsible
> for
> > > voting on releases, receiving notices for new PPMC members, etc
> > regardless
> > > of who is the sponsor.  Given enough contributors and a diverse
> > contributor
> > > base then the Incubator PMC and the Logging PMC (if they're the
> sponsor)
> > > would vote whether everyone feels the new project can be brought back
> to
> > > the Logging project.  We can also decide as it gets closer to
> graduation
> > to
> > > move the podling into a sub-project if that's what everyone agrees.
> > >
> > > I would be up for helping you get through the incubator.  If VP Logging
> > > doesn't want to own the sponsorship part, I can be your Champion.
> > >
> > > John
> > >
> > >
> > > [1]: https://incubator.apache.org/guides/proposal.html#background
> > >
> > > On Mon, Dec 20, 2021 at 8:20 AM Vladimir Sitnikov <
> > > sitnikov.vladi...@gmail.com> wrote:
> > >
> > > > >Do you have "facts" (like message on mailing list) ?
> > > >
> > > > I am not sure what you mean.
> > > >
> > > > For example:
> > > >
> > > > 1) Ralph Goers says the existing committers did not touch 1.x code a
> > lot:
> > > > https://lists.apache.org/thread/j6zrdp1d148qpkg0g7x3cc41o070oq6n
> > > > Ralph>Virtually all of the contributors to the Log4j 1.x project
> left a
> > > few
> > > > years before it was declared
> > > > Ralph>EOL. That is the primary reason it was retired. Although the
> > > current
> > > > set of committers have
> > > > Ralph>access to the code, none of us have ever built it
> > > >
> > > > 2) Ralph Goers (a member of Logging PMC) suggested that one of the
> ways
> > > to
> > > > move forward is to re-incubate log4j 1.x:
> > > > 

Re: [VOTE] Accept Druid into the Apache Incubator

2018-02-22 Thread Jungtaek Lim
+1 (non-binding)

2018년 2월 23일 (금) 오전 9:13, Pramod Immaneni 님이 작성:

> +1
>
> On Thu, Feb 22, 2018 at 11:03 AM, Julian Hyde  wrote:
>
> > Hi all,
> >
> > After some discussion on the Druid proposal[1], I'd like to
> > start a vote on accepting Druid into the Apache Incubator,
> > per the ASF policy[2] and voting rules[3].
> >
> > A vote for accepting a new Apache Incubator podling is a
> > majority vote for which only Incubator PMC member votes are
> > binding. Votes from other people are also welcome as an
> > indication of people's enthusiasm (or lack thereof).
> >
> > Please do not use this VOTE thread for discussions.  If
> > needed, start a new thread instead.
> >
> > This vote will run for at least 72 hours. Please VOTE as
> > follows:
> >  [ ] +1 Accept Druid into the Apache Incubator
> >  [ ] +0 Abstain
> >  [ ] -1 Do not accept Druid into the Apache Incubator
> > because ...
> >
> > The proposal is listed below, but you can also access it on
> > the wiki[4].
> >
> > Julian
> >
> > [1] https://lists.apache.org/thread.html/b95f90a30b6e8587e9b108f368b07c
> > 1b3e23e25ca592448d9c9f81e2@%3Cgeneral.incubator.apache.org%3E
> >
> > [2] https://incubator.apache.org/policy/incubation.html#
> > approval_of_proposal_by_sponsor
> >
> > [3] http://www.apache.org/foundation/voting.html
> >
> > [4] https://wiki.apache.org/incubator/DruidProposal
> >
> >
> >
> >
> >
> > = Druid Proposal =
> >
> > == Abstract ==
> >
> > Druid is a high-performance, column-oriented, distributed
> > data store.
> >
> > == Proposal ==
> >
> > Druid is an open source data store designed for real-time
> > exploratory analytics on large data sets. Druid's key
> > features are a column-oriented storage layout, a distributed
> > shared-nothing architecture, and ability to generate and
> > leverage indexing and caching structures. Druid is typically
> > deployed in clusters of tens to hundreds of nodes, and has
> > the ability to load data from Apache Kafka and Apache
> > Hadoop, among other data sources. Druid offers two query
> > languages: a SQL dialect (powered by Apache Calcite) and a
> > JSON-over-HTTP API.
> >
> > Druid was originally developed to power a slice-and-dice
> > analytical UI built on top of large event streams. The
> > original use case for Druid targeted ingest rates of
> > millions of records/sec, retention of over a year of data,
> > and query latencies of sub-second to a few seconds. Many
> > people can benefit from such capability, and many already
> > have (see http://druid.io/druid-powered.html). In addition,
> > new use cases have emerged since Druid's original
> > development, such as OLAP acceleration of data warehouse
> > tables and more highly concurrent applications operating
> > with relatively narrower queries.
> >
> > == Background ==
> >
> > Druid is a data store designed for fast analytics. It would
> > typically be used in lieu of more general purpose query
> > systems like Hadoop MapReduce or Spark when query latency is
> > of the utmost importance. Druid is often used as a data
> > store for powering GUI analytical applications.
> >
> > The buzzwordy description of Druid is a high-performance,
> > column-oriented, distributed data store. What we mean by
> > this is:
> >
> > * "high performance": Druid aims to provide low query
> >   latency and high ingest rates possible.
> > * "column-oriented": Druid stores data in a column-oriented
> >   format, like most other systems designed for analytics. It
> >   can also store indexes along with the columns.
> > * "distributed": Druid is deployed in clusters, typically of
> >   tens to hundreds of nodes.
> > * "data store": Druid loads your data and stores a copy of
> >   it on the cluster's local disks (and may cache it in
> >   memory). It doesn't query your data from some other
> >   storage system.
> >
> > == Rationale ==
> >
> > Druid is a mature, active project with a large number of
> > production installations, dozens of contributors to each
> > release, and multiple vendors offering professional
> > support. Given Druid's strong community, its close
> > integration with many other Apache projects (such as Kafka,
> > Hadoop, and Calcite), and its pre-existing Apache-inspired
> > governance structure, we feel that Apache is the best home
> > for the project on a long-term basis.
> >
> > == Current Status ==
> >
> > === Meritocracy ===
> >
> > Since Druid was first open sourced the original developers
> > have solicited contributions from others, including through
> > our blog, the project mailing lists, and through accepting
> > GitHub pull requests. We have an Apache-inspired governance
> > structure with a PMC and committers, and our committer ranks
> > include a good number of people from outside the original
> > development team.
> >
> > === Community ===
> >
> > The Druid core developers have sought to nurture a community
> > throughout the life of the project. We use GitHub as the
> > focal point