The incremental data every day is more than 80T and the overall data processed every data is around 30P.
On Thu, Jul 21, 2016 at 4:13 PM, Glen Cao <microw...@gmail.com> wrote: > iPinYou (www.ipinyou.com.cn/?defaultLocale=en > ) is the largest DSP in China which has its HQ in Beijing and offices in > Shanghai, Guangzhou, Silicon Valley and Seattle. > > > Kafka clusters are the central data hub in iPinYou. All kinds of Internet > display advertising data, such as bid/no-bid, impression, click, advertiser, > conversion and etc., are collected as primary data streams into Kafka brokers > in real time, by LogAggregator (a substitute for Apache Flume, which is > implemented in C/C++ by iPinYou, has customized functionality, better > performance, lower resource-consuming). And quite lots of data in Kafka > brokers is loaded into HDFS in near real time by Kafka2HDFS (a distributed > pipeline system which is implemented in C/C++ by iPinYou and offers flexible > tuning between latency and throughput). And quite lots of data in Kafka > brokers is consumered by streaming applications on Apache Storm and Apache > Spark Streaming. > > LogAggregator: > https://github.com/microwish/log_aggregator > > Kafka2HDFS: > https://github.com/microwish/kafka2hdfs >