Re: [IPMC] What to do with retired podling repositories

2019-01-13 Thread Sharan F
Hi

I am doing some research on incubator projects and the ones that didn't
make it to TLP could be interesting to analyse so I'd be in favour of the
rename rather than a delete for now.

Thanks
Sharan






On Mon, 14 Jan 2019, 07:12 Myrle Krantz  If we can’t name a reason for keeping the data, I’d be inclined to just
> delete.  We are not data squirrels.
>
> : o),
> Myrle
>
> On Sun, Jan 13, 2019 at 10:15 AM Daniel Gruno 
> wrote:
>
> > Hello IPMC and other folks,
> >
> > We have a big bunch of retired podlings with git repositories on
> > git-wip-us. As we are working on retiring this service, we need to
> > address what happens with these old project repositories.
> >
> > The retired podlings we need to address are:
> > blur, cmda, concerted, corinthia, cotton, gearpump, gossip, hdt, horn,
> > htrace, iota, mrql, openaz, pirk, provisionr, quickstep, ripple, s4,
> > slider, wave
> >
> > Before February 7th, we at ASF Infra, would love if the incubator could
> > decide what happens to these repositories, either individually or as a
> > whole.
> >
> > Some suggested options are:
> >
> > 1) delete the repositories
> > 2) rename them to incubator-retired-$foo.git
> > 3) Do nothing, but put a note on github etc that they retired.
> > 4) punt it to the attic if possible (you'd have to coordinate with the
> > Attic PMC then)
> > 5) Something else??
> >
> > Please talk among yourselves and let Infra know :)
> >
> > With regards,
> > Daniel on behalf of ASF Infra.
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


Re: [VOTE] Accept Hudi into the Apache Incubator

2019-01-13 Thread Pierre Smits
+1

On Mon, 14 Jan 2019 at 00:02 Luciano Resende  wrote:

> +1 (binding)
>
> On Sun, Jan 13, 2019 at 2:34 PM Thomas Weise  wrote:
> >
> > Hi all,
> >
> > Following the discussion of the Hudi proposal in [1], this is a vote
> > on accepting Hudi into the Apache Incubator,
> > per the ASF policy [2] and voting rules [3].
> >
> > A vote for accepting a new Apache Incubator podling is a
> > majority vote. Everyone is welcome to vote, only
> > Incubator PMC member votes are binding.
> >
> > This vote will run for at least 72 hours. Please VOTE as
> > follows:
> >
> > [ ] +1 Accept Hudi into the Apache Incubator
> > [ ] +0 Abstain
> > [ ] -1 Do not accept Hudi into the Apache Incubator because ...
> >
> > The proposal is included below, but you can also access it on
> > the wiki [4].
> >
> > Thanks for reviewing and voting,
> > Thomas
> >
> > [1]
> >
> https://lists.apache.org/thread.html/12e2bdaa095d68dae6f8731e473d3d43885783177d1b7e3ff2f65b6d@%3Cgeneral.incubator.apache.org%3E
> >
> > [2]
> >
> https://incubator.apache.org/policy/incubation.html#approval_of_proposal_by_sponsor
> >
> > [3] http://www.apache.org/foundation/voting.html
> >
> > [4] https://wiki.apache.org/incubator/HudiProposal
> >
> >
> >
> > = Hudi Proposal =
> >
> > == Abstract ==
> >
> > Hudi is a big-data storage library, that provides atomic upserts and
> > incremental data streams.
> >
> > Hudi manages data stored in Apache Hadoop and other API compatible
> > distributed file systems/cloud stores.
> >
> > == Proposal ==
> >
> > Hudi provides the ability to atomically upsert datasets with new values
> in
> > near-real time, making data available quickly to existing query engines
> > like Apache Hive, Apache Spark, & Presto. Additionally, Hudi provides a
> > sequence of changes to a dataset from a given point-in-time to enable
> > incremental data pipelines that yield greater efficiency & latency than
> > their typical batch counterparts. By carefully managing number of files &
> > sizes, Hudi greatly aids both query engines (e.g: always providing
> > well-sized files) and underlying storage (e.g: HDFS NameNode memory
> > consumption).
> >
> > Hudi is largely implemented as an Apache Spark library that reads/writes
> > data from/to Hadoop compatible filesystem. SQL queries on Hudi datasets
> are
> > supported via specialized Apache Hadoop input formats, that understand
> > Hudi’s storage layout. Currently, Hudi manages datasets using a
> combination
> > of Apache Parquet & Apache Avro file/serialization formats.
> >
> > == Background ==
> >
> > Apache Hadoop distributed filesystem (HDFS) & other compatible cloud
> > storage systems (e.g: Amazon S3, Google Cloud, Microsoft Azure) serve as
> > longer term analytical storage for thousands of organizations. Typical
> > analytical datasets are built by reading data from a source (e.g:
> upstream
> > databases, messaging buses, or other datasets), transforming the data,
> > writing results back to storage, & making it available for analytical
> > queries--all of this typically accomplished in batch jobs which operate
> in
> > a bulk fashion on partitions of datasets. Such a style of processing
> > typically incurs large delays in making data available to queries as well
> > as lot of complexity in carefully partitioning datasets to guarantee
> > latency SLAs.
> >
> > The need for fresher/faster analytics has increased enormously in the
> past
> > few years, as evidenced by the popularity of Stream processing systems
> like
> > Apache Spark, Apache Flink, and messaging systems like Apache Kafka. By
> > using updateable state store to incrementally compute & instantly reflect
> > new results to queries and using a “tailable” messaging bus to publish
> > these results to other downstream jobs, such systems employ a different
> > approach to building analytical dataset. Even though this approach yields
> > low latency, the amount of data managed in such real-time data-marts is
> > typically limited in comparison to the aforementioned longer term storage
> > options. As a result, the overall data architecture has become more
> complex
> > with more moving parts and specialized systems, leading to duplication of
> > data and a strain on usability.
> >
> > Hudi takes a hybrid approach. Instead of moving vast amounts of batch
> data
> > to streaming systems, we simply add the streaming primitives (upserts &
> > incremental consumption) onto existing batch processing technologies. We
> > believe that by adding some missing blocks to an existing Hadoop stack,
> we
> > are able to a provide similar capabilities right on top of Hadoop at a
> > reduced cost and with an increased efficiency, greatly simplifying the
> > overall architecture in the process.
> >
> > Hudi was originally developed at Uber (original name “Hoodie”) to address
> > such broad inefficiencies in ingest & ETL & ML pipelines across Uber’s
> data
> > ecosystem that required the upsert & incremental consumption primitives
> > supported by 

Re: Discussion board vs. user mailing list in incubating project

2019-01-13 Thread Dave Fisher



Sent from my iPhone

> On Jan 13, 2019, at 10:09 PM, Myrle Krantz  wrote:
> 
> How’s their discussion board archived?

Yes, archival is the question. One approach we took with OpenOffice was to move 
the phpBB for the forums onto PMC managed VMs using Apache Infra provided MySQL.

Regards,
Dave

> 
> -Myrle
> 
>> On Sat, Jan 12, 2019 at 7:01 PM Sebastian  wrote:
>> 
>> Hi,
>> 
>> I recently received a question from a project that is considering to
>> apply to the incubator and I did not know the answer, so I would like to
>> ask for your input on that.
>> 
>> The project has a very active web-based discussion board, similar to
>> projects like Apache MXNet on https://discuss.mxnet.io/ or Rust on
>> https://users.rust-lang.org/. However, Apache projects traditionally
>> prefer a user mailing list to web-based discussion boards in my experience.
>> 
>> The question is now whether the project would have to migrate the
>> discussion board to the user mailing list during incubation or whether
>> they could keep it. Do we have an official stance on that?
>> 
>> Best,
>> Sebastian
>> 
>> PS: As a side note, I'd like to mention that the project will have a
>> dev@ and private@ mailinglist as usual, this question is only about the
>> interactions with users.
>> 
>> 
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
>> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [IPMC] What to do with retired podling repositories

2019-01-13 Thread Myrle Krantz
If we can’t name a reason for keeping the data, I’d be inclined to just
delete.  We are not data squirrels.

: o),
Myrle

On Sun, Jan 13, 2019 at 10:15 AM Daniel Gruno  wrote:

> Hello IPMC and other folks,
>
> We have a big bunch of retired podlings with git repositories on
> git-wip-us. As we are working on retiring this service, we need to
> address what happens with these old project repositories.
>
> The retired podlings we need to address are:
> blur, cmda, concerted, corinthia, cotton, gearpump, gossip, hdt, horn,
> htrace, iota, mrql, openaz, pirk, provisionr, quickstep, ripple, s4,
> slider, wave
>
> Before February 7th, we at ASF Infra, would love if the incubator could
> decide what happens to these repositories, either individually or as a
> whole.
>
> Some suggested options are:
>
> 1) delete the repositories
> 2) rename them to incubator-retired-$foo.git
> 3) Do nothing, but put a note on github etc that they retired.
> 4) punt it to the attic if possible (you'd have to coordinate with the
> Attic PMC then)
> 5) Something else??
>
> Please talk among yourselves and let Infra know :)
>
> With regards,
> Daniel on behalf of ASF Infra.
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: Discussion board vs. user mailing list in incubating project

2019-01-13 Thread Myrle Krantz
How’s their discussion board archived?

-Myrle

On Sat, Jan 12, 2019 at 7:01 PM Sebastian  wrote:

> Hi,
>
> I recently received a question from a project that is considering to
> apply to the incubator and I did not know the answer, so I would like to
> ask for your input on that.
>
> The project has a very active web-based discussion board, similar to
> projects like Apache MXNet on https://discuss.mxnet.io/ or Rust on
> https://users.rust-lang.org/. However, Apache projects traditionally
> prefer a user mailing list to web-based discussion boards in my experience.
>
> The question is now whether the project would have to migrate the
> discussion board to the user mailing list during incubation or whether
> they could keep it. Do we have an official stance on that?
>
> Best,
> Sebastian
>
> PS: As a side note, I'd like to mention that the project will have a
> dev@ and private@ mailinglist as usual, this question is only about the
> interactions with users.
>
>
>
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


[CANCELLED][VOTE] Release Apache Doris 0.9.0-incubating-rc01

2019-01-13 Thread 德 李
Hi -,

I'm cancelling the vote due to license and building issues after discussed
in the Doris community.
For details see:
https://lists.apache.org/thread.html/6c8177522029d5bac98d0b300dbe16ffc22e3d0
6351ffc88501b2100@%3Cgeneral.incubator.apache.org%3E

The Doris community will fix these issues and send out rc02 for another
vote.

Best Regards,
Reed




Re: [VOTE] Release Apache Doris 0.9.0-incubating-rc01

2019-01-13 Thread 德 李
Hi Makoto,

We got it, thank you.
We are going to recheck and improve docker building scripts.

Best Regards,
Reed

On 2019/1/12 上午1:12, "Makoto Yui"  wrote:

>With PARALLEL=1, the build fails as follows
>
>/usr/bin/../bin/g++ -DHAVE_CONFIG_H -I. -I../..
>-I../../lib/cpp/src/thrift  -I./src
>-I/tmp/doris/apache-doris-0.9.0.rc01-incubating-src/thirdparty/installed/i
>nclude
>-Wall -Wextra -pedantic -g -O2 -std=c++11 -MT
>src/generate/thrift-t_as3_generator.o -MD -MP -MF
>src/generate/.deps/thrift-t_as3_generator.Tpo -c -o
>src/generate/thrift-t_as3_generator.o `test -f
>'src/generate/t_as3_generator.cc' || echo
>'./'`src/generate/t_as3_generator.cc
>
>src/generate/t_as3_generator.cc: In member function 'void
>t_as3_generator::generate_as3_struct(t_struct*, bool)':
>
>src/generate/t_as3_generator.cc:663:1: internal compiler error:
>Segmentation
>fault
>
> }
>
> ^
>
>Please submit a full bug report,
>
>with preprocessed source if appropriate.
>
>See  for instructions.
>
>Makefile:1066: recipe for target 'src/generate/thrift-t_as3_generator.o'
>failed
>
>make[3]: *** [src/generate/thrift-t_as3_generator.o] Error 1
>
>make[3]: Leaving directory
>'/tmp/doris/apache-doris-0.9.0.rc01-incubating-src/thirdparty/src/thrift-0
>.9.3/compiler/cpp'
>
>Makefile:588: recipe for target 'all' failed
>
>make[2]: *** [all] Error 2
>
>make[2]: Leaving directory
>'/tmp/doris/apache-doris-0.9.0.rc01-incubating-src/thirdparty/src/thrift-0
>.9.3/compiler/cpp'
>
>Makefile:609: recipe for target 'all-recursive' failed
>
>make[1]: *** [all-recursive] Error 1
>
>make[1]: Leaving directory
>'/tmp/doris/apache-doris-0.9.0.rc01-incubating-src/thirdparty/src/thrift-0
>.9.3'
>
>Makefile:530: recipe for target 'all' failed
>
>make: *** [all] Error 2
>
>I'm not sure it's host memory issue. Host has enough memory left.
>
>BTW, the project page[1] should have disclaimer [2] of undergoing
>incubation message.
>[1] https://doris.incubator.apache.org/
>[2] https://incubator.apache.org/guides/branding.html#disclaimers
>
>Makoto
>
>2019年1月11日(金) 17:07 Li,De(BDG) :
>
>> Hi Makoto,
>>
>> Thank you for your check.
>>
>> >> Why not cancel rc1 and create rc2 fixing license header issue
>>that
>> Willem
>> >> pointed?
>> >> It seems it's better to be fixed.
>>
>> Actually, we have fixed these issues in branch-0.9.0 (#471, #473) after
>> Willem pointed.
>>
>> I’m afraid it will take everyone’s more time to check repeatedly and so
>> I’m going to fix them in next version.
>>
>> >> It also seems that no one from IPMC succeeded to build
>>distribution
>> from src
>>
>> Indeed, the building is not good as we expect, but I noticed the main
>> cause is that it can’t compile in OSX and it needs GCC 5.3.1+.
>> But it seems Dave Meikle has succeeded to build in Ubuntu 18.04 and we
>> have tested pass in CentOS and Ubuntu.
>> We will continue to check and complete the building, also docker
>> environment as you mentioned.
>>
>> Best Regards,
>> Reed
>>
>>
>> On 2019/1/11 上午2:23, "Makoto Yui"  wrote:
>>
>> >Reed,
>> >
>> >Why not cancel rc1 and create rc2 fixing license header issue that
>>Willem
>> >pointed?
>> >It seems it's better to be fixed.
>> >
>> >It also seems that no one from IPMC succeeded to build distribution
>>from
>> >src (no sure for Luke though).
>> >
>> >I got the following build error (which might be gcc-5 version issue):
>> >
>> >[ 57%] Building CXX object
>> 
>>>projects/compiler-rt/lib/msan/CMakeFiles/clang_rt.msan-x86_64.dir/msan_i
>>>nt
>> >erceptors.cc.o
>> >
>> 
>>>/tmp/doris/apache-doris-0.9.0.rc01-incubating-src/thirdparty/src/llvm-3.
>>>4.
>> >2.src/projects/compiler-rt/lib/msan/msan_interceptors.cc:
>> >In function 'void __msan::InitializeInterceptors()':
>> >
>> 
>>>/tmp/doris/apache-doris-0.9.0.rc01-incubating-src/thirdparty/src/llvm-3.
>>>4.
>> >2.src/projects/compiler-rt/lib/msan/msan_interceptors.cc:1573:1:
>> >internal
>> >compiler error: Segmentation fault
>> >
>> > }
>> >
>> > ^
>> >
>> >Please submit a full bug report,
>> >
>> >with preprocessed source if appropriate.
>> >
>> >See  for instructions.
>> >
>> >projects/compiler-rt/lib/msan/CMakeFiles/clang_rt.msan-x86_64.dir/
>> build.ma
>> >ke:134:
>> >recipe for target
>> 
>>>'projects/compiler-rt/lib/msan/CMakeFiles/clang_rt.msan-x86_64.dir/msan_
>>>in
>> >terceptors.cc.o'
>> >failed
>> >
>> >I used ubuntu xenial on docker on OSX.
>> >Software versions meets requirements (except Maven version) as seen in
>> >
>> >docker run ubuntu:xenial -it
>> >
>> >
>> >$ apt-get install wget openjdk-8-jdk maven gcc-5 bzip2 python cmake zip
>> >xz-utils patch byacc flex automake libtool g++
>> >
>> >
>> >root@d9e5b7017e7b:/tmp/doris/apache-doris-0.9.0.rc01-incubating-src#
>>gcc
>> >--version | head -1
>> >
>> >gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
>> >
>> >
>> >root@d9e5b7017e7b:/tmp/doris/apache-doris-0.9.0.rc01-incubating-src#
>>java
>> >-version
>> >
>> >openjdk version "1.8.0_191"
>> >
>> >OpenJDK Runtime Environment (build
>> 

Re: [VOTE] Accept Hudi into the Apache Incubator

2019-01-13 Thread Luciano Resende
+1 (binding)

On Sun, Jan 13, 2019 at 2:34 PM Thomas Weise  wrote:
>
> Hi all,
>
> Following the discussion of the Hudi proposal in [1], this is a vote
> on accepting Hudi into the Apache Incubator,
> per the ASF policy [2] and voting rules [3].
>
> A vote for accepting a new Apache Incubator podling is a
> majority vote. Everyone is welcome to vote, only
> Incubator PMC member votes are binding.
>
> This vote will run for at least 72 hours. Please VOTE as
> follows:
>
> [ ] +1 Accept Hudi into the Apache Incubator
> [ ] +0 Abstain
> [ ] -1 Do not accept Hudi into the Apache Incubator because ...
>
> The proposal is included below, but you can also access it on
> the wiki [4].
>
> Thanks for reviewing and voting,
> Thomas
>
> [1]
> https://lists.apache.org/thread.html/12e2bdaa095d68dae6f8731e473d3d43885783177d1b7e3ff2f65b6d@%3Cgeneral.incubator.apache.org%3E
>
> [2]
> https://incubator.apache.org/policy/incubation.html#approval_of_proposal_by_sponsor
>
> [3] http://www.apache.org/foundation/voting.html
>
> [4] https://wiki.apache.org/incubator/HudiProposal
>
>
>
> = Hudi Proposal =
>
> == Abstract ==
>
> Hudi is a big-data storage library, that provides atomic upserts and
> incremental data streams.
>
> Hudi manages data stored in Apache Hadoop and other API compatible
> distributed file systems/cloud stores.
>
> == Proposal ==
>
> Hudi provides the ability to atomically upsert datasets with new values in
> near-real time, making data available quickly to existing query engines
> like Apache Hive, Apache Spark, & Presto. Additionally, Hudi provides a
> sequence of changes to a dataset from a given point-in-time to enable
> incremental data pipelines that yield greater efficiency & latency than
> their typical batch counterparts. By carefully managing number of files &
> sizes, Hudi greatly aids both query engines (e.g: always providing
> well-sized files) and underlying storage (e.g: HDFS NameNode memory
> consumption).
>
> Hudi is largely implemented as an Apache Spark library that reads/writes
> data from/to Hadoop compatible filesystem. SQL queries on Hudi datasets are
> supported via specialized Apache Hadoop input formats, that understand
> Hudi’s storage layout. Currently, Hudi manages datasets using a combination
> of Apache Parquet & Apache Avro file/serialization formats.
>
> == Background ==
>
> Apache Hadoop distributed filesystem (HDFS) & other compatible cloud
> storage systems (e.g: Amazon S3, Google Cloud, Microsoft Azure) serve as
> longer term analytical storage for thousands of organizations. Typical
> analytical datasets are built by reading data from a source (e.g: upstream
> databases, messaging buses, or other datasets), transforming the data,
> writing results back to storage, & making it available for analytical
> queries--all of this typically accomplished in batch jobs which operate in
> a bulk fashion on partitions of datasets. Such a style of processing
> typically incurs large delays in making data available to queries as well
> as lot of complexity in carefully partitioning datasets to guarantee
> latency SLAs.
>
> The need for fresher/faster analytics has increased enormously in the past
> few years, as evidenced by the popularity of Stream processing systems like
> Apache Spark, Apache Flink, and messaging systems like Apache Kafka. By
> using updateable state store to incrementally compute & instantly reflect
> new results to queries and using a “tailable” messaging bus to publish
> these results to other downstream jobs, such systems employ a different
> approach to building analytical dataset. Even though this approach yields
> low latency, the amount of data managed in such real-time data-marts is
> typically limited in comparison to the aforementioned longer term storage
> options. As a result, the overall data architecture has become more complex
> with more moving parts and specialized systems, leading to duplication of
> data and a strain on usability.
>
> Hudi takes a hybrid approach. Instead of moving vast amounts of batch data
> to streaming systems, we simply add the streaming primitives (upserts &
> incremental consumption) onto existing batch processing technologies. We
> believe that by adding some missing blocks to an existing Hadoop stack, we
> are able to a provide similar capabilities right on top of Hadoop at a
> reduced cost and with an increased efficiency, greatly simplifying the
> overall architecture in the process.
>
> Hudi was originally developed at Uber (original name “Hoodie”) to address
> such broad inefficiencies in ingest & ETL & ML pipelines across Uber’s data
> ecosystem that required the upsert & incremental consumption primitives
> supported by Hudi.
>
> == Rationale ==
>
> We truly believe the capabilities supported by Hudi would be increasingly
> useful for big-data ecosystems, as data volumes & need for faster data
> continue to increase. A detailed description of target use-cases can be
> found at 

[VOTE] Accept Hudi into the Apache Incubator

2019-01-13 Thread Thomas Weise
Hi all,

Following the discussion of the Hudi proposal in [1], this is a vote
on accepting Hudi into the Apache Incubator,
per the ASF policy [2] and voting rules [3].

A vote for accepting a new Apache Incubator podling is a
majority vote. Everyone is welcome to vote, only
Incubator PMC member votes are binding.

This vote will run for at least 72 hours. Please VOTE as
follows:

[ ] +1 Accept Hudi into the Apache Incubator
[ ] +0 Abstain
[ ] -1 Do not accept Hudi into the Apache Incubator because ...

The proposal is included below, but you can also access it on
the wiki [4].

Thanks for reviewing and voting,
Thomas

[1]
https://lists.apache.org/thread.html/12e2bdaa095d68dae6f8731e473d3d43885783177d1b7e3ff2f65b6d@%3Cgeneral.incubator.apache.org%3E

[2]
https://incubator.apache.org/policy/incubation.html#approval_of_proposal_by_sponsor

[3] http://www.apache.org/foundation/voting.html

[4] https://wiki.apache.org/incubator/HudiProposal



= Hudi Proposal =

== Abstract ==

Hudi is a big-data storage library, that provides atomic upserts and
incremental data streams.

Hudi manages data stored in Apache Hadoop and other API compatible
distributed file systems/cloud stores.

== Proposal ==

Hudi provides the ability to atomically upsert datasets with new values in
near-real time, making data available quickly to existing query engines
like Apache Hive, Apache Spark, & Presto. Additionally, Hudi provides a
sequence of changes to a dataset from a given point-in-time to enable
incremental data pipelines that yield greater efficiency & latency than
their typical batch counterparts. By carefully managing number of files &
sizes, Hudi greatly aids both query engines (e.g: always providing
well-sized files) and underlying storage (e.g: HDFS NameNode memory
consumption).

Hudi is largely implemented as an Apache Spark library that reads/writes
data from/to Hadoop compatible filesystem. SQL queries on Hudi datasets are
supported via specialized Apache Hadoop input formats, that understand
Hudi’s storage layout. Currently, Hudi manages datasets using a combination
of Apache Parquet & Apache Avro file/serialization formats.

== Background ==

Apache Hadoop distributed filesystem (HDFS) & other compatible cloud
storage systems (e.g: Amazon S3, Google Cloud, Microsoft Azure) serve as
longer term analytical storage for thousands of organizations. Typical
analytical datasets are built by reading data from a source (e.g: upstream
databases, messaging buses, or other datasets), transforming the data,
writing results back to storage, & making it available for analytical
queries--all of this typically accomplished in batch jobs which operate in
a bulk fashion on partitions of datasets. Such a style of processing
typically incurs large delays in making data available to queries as well
as lot of complexity in carefully partitioning datasets to guarantee
latency SLAs.

The need for fresher/faster analytics has increased enormously in the past
few years, as evidenced by the popularity of Stream processing systems like
Apache Spark, Apache Flink, and messaging systems like Apache Kafka. By
using updateable state store to incrementally compute & instantly reflect
new results to queries and using a “tailable” messaging bus to publish
these results to other downstream jobs, such systems employ a different
approach to building analytical dataset. Even though this approach yields
low latency, the amount of data managed in such real-time data-marts is
typically limited in comparison to the aforementioned longer term storage
options. As a result, the overall data architecture has become more complex
with more moving parts and specialized systems, leading to duplication of
data and a strain on usability.

Hudi takes a hybrid approach. Instead of moving vast amounts of batch data
to streaming systems, we simply add the streaming primitives (upserts &
incremental consumption) onto existing batch processing technologies. We
believe that by adding some missing blocks to an existing Hadoop stack, we
are able to a provide similar capabilities right on top of Hadoop at a
reduced cost and with an increased efficiency, greatly simplifying the
overall architecture in the process.

Hudi was originally developed at Uber (original name “Hoodie”) to address
such broad inefficiencies in ingest & ETL & ML pipelines across Uber’s data
ecosystem that required the upsert & incremental consumption primitives
supported by Hudi.

== Rationale ==

We truly believe the capabilities supported by Hudi would be increasingly
useful for big-data ecosystems, as data volumes & need for faster data
continue to increase. A detailed description of target use-cases can be
found at https://uber.github.io/hudi/use_cases.html.

Given our reliance on so many great Apache projects, we believe that the
Apache way of open source community driven development will enable us to
evolve Hudi in collaboration with a diverse set of contributors who can
bring new ideas into the project.


[IPMC] What to do with retired podling repositories

2019-01-13 Thread Daniel Gruno

Hello IPMC and other folks,

We have a big bunch of retired podlings with git repositories on 
git-wip-us. As we are working on retiring this service, we need to 
address what happens with these old project repositories.


The retired podlings we need to address are:
blur, cmda, concerted, corinthia, cotton, gearpump, gossip, hdt, horn, 
htrace, iota, mrql, openaz, pirk, provisionr, quickstep, ripple, s4, 
slider, wave


Before February 7th, we at ASF Infra, would love if the incubator could 
decide what happens to these repositories, either individually or as a 
whole.


Some suggested options are:

1) delete the repositories
2) rename them to incubator-retired-$foo.git
3) Do nothing, but put a note on github etc that they retired.
4) punt it to the attic if possible (you'd have to coordinate with the 
Attic PMC then)

5) Something else??

Please talk among yourselves and let Infra know :)

With regards,
Daniel on behalf of ASF Infra.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org