The incremental data every day is more than 80T and the overall data
processed every data is around 30P.

On Thu, Jul 21, 2016 at 4:13 PM, Glen Cao <microw...@gmail.com> wrote:

> iPinYou (www.ipinyou.com.cn/?defaultLocale=en
> ) is the largest DSP in China which has its HQ in Beijing and offices in 
> Shanghai, Guangzhou, Silicon Valley and Seattle.
>
>
> Kafka clusters are the central data hub in iPinYou. All kinds of Internet 
> display advertising data, such as bid/no-bid, impression, click, advertiser, 
> conversion and etc., are collected as primary data streams into Kafka brokers 
> in real time, by LogAggregator (a substitute for Apache Flume, which is 
> implemented in C/C++ by iPinYou, has customized functionality, better 
> performance, lower resource-consuming). And quite lots of data in Kafka 
> brokers is loaded into HDFS in near real time by Kafka2HDFS (a distributed 
> pipeline system which is implemented in C/C++ by iPinYou and offers flexible 
> tuning between latency and throughput). And quite lots of data in Kafka 
> brokers is consumered by streaming applications on Apache Storm and Apache 
> Spark Streaming.
>
> LogAggregator:
> https://github.com/microwish/log_aggregator
>
> Kafka2HDFS:
> https://github.com/microwish/kafka2hdfs
>

Reply via email to