You can run a mapreduce firstly, Join these data sets into one data set. then analyze the joined dataset.
On Mon, Dec 30, 2013 at 3:58 PM, Fengyun RAO <raofeng...@gmail.com> wrote: > Hi, > > HDFS splits files into blocks, and mapreduce runs a map task for each > block. However, Fields could be changed in IIS log files, which means > fields in one block may depend on another, and thus make it not suitable > for mapreduce job. It seems there should be some preprocess before storing > and analyzing the IIS log files. We plan to parse each line to the same > fields and store in Avro files with compression. Any other alternatives? > Hbase? or any suggestions on analyzing IIS log files? > > thanks! > > >