You can run a mapreduce firstly, Join these data sets into one data set.
then analyze the joined dataset.


On Mon, Dec 30, 2013 at 3:58 PM, Fengyun RAO <raofeng...@gmail.com> wrote:

> Hi,
>
> HDFS splits files into blocks, and mapreduce runs a map task for each
> block. However, Fields could be changed in IIS log files, which means
> fields in one block may depend on another, and thus make it not suitable
> for mapreduce job. It seems there should be some preprocess before storing
> and analyzing the IIS log files. We plan to parse each line to the same
> fields and store in Avro files with compression. Any other alternatives?
> Hbase?  or any suggestions on analyzing IIS log files?
>
> thanks!
>
>
>

Reply via email to