--
Date: Tue, 31 Dec 2013 09:39:58 +0800
Subject: Re: any suggestions on IIS log storage and analysis?
From: raofeng...@gmail.com
To: user@hadoop.apache.org
Thanks, Yong!
The dependence never cross files, but since HDFS splits files into
blocks, it may cross blocks, which makes
=0CD0Q6AEwAA
Yong
--
Date: Tue, 31 Dec 2013 09:39:58 +0800
Subject: Re: any suggestions on IIS log storage and analysis?
From: raofeng...@gmail.com
To: user@hadoop.apache.org
Thanks, Yong!
The dependence never cross files, but since HDFS splits files
Hi,
HDFS splits files into blocks, and mapreduce runs a map task for each
block. However, Fields could be changed in IIS log files, which means
fields in one block may depend on another, and thus make it not suitable
for mapreduce job. It seems there should be some preprocess before storing
and
You can run a mapreduce firstly, Join these data sets into one data set.
then analyze the joined dataset.
On Mon, Dec 30, 2013 at 3:58 PM, Fengyun RAO raofeng...@gmail.com wrote:
Hi,
HDFS splits files into blocks, and mapreduce runs a map task for each
block. However, Fields could be
what do you mean by join the data sets?
a fake sample log file:
#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
#Date: 2013-07-04 20:00:00
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port
cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status
, 30 Dec 2013 15:58:57 +0800
Subject: any suggestions on IIS log storage and analysis?
From: raofeng...@gmail.com
To: user@hadoop.apache.org
Hi,
HDFS splits files into blocks, and mapreduce runs a map task for each block.
However, Fields could be changed in IIS log files, which means fields in one
several MR jobs together.
Yong
--
Date: Mon, 30 Dec 2013 15:58:57 +0800
Subject: any suggestions on IIS log storage and analysis?
From: raofeng...@gmail.com
To: user@hadoop.apache.org
Hi,
HDFS splits files into blocks, and mapreduce runs a map task for each
Google Hadoop WholeFileInputFormat or search it in book Hadoop: The
Definitive Guide
Yong
Date: Tue, 31 Dec 2013 09:39:58 +0800
Subject: Re: any suggestions on IIS log storage and analysis?
From: raofeng...@gmail.com
To: user@hadoop.apache.org
Thanks, Yong!
The dependence never cross files