Re: any suggestions on IIS log storage and analysis?

2014-01-02 Thread Fengyun RAO
-- Date: Tue, 31 Dec 2013 09:39:58 +0800 Subject: Re: any suggestions on IIS log storage and analysis? From: raofeng...@gmail.com To: user@hadoop.apache.org Thanks, Yong! The dependence never cross files, but since HDFS splits files into blocks, it may cross blocks, which makes

Re: any suggestions on IIS log storage and analysis?

2013-12-31 Thread Peyman Mohajerian
=0CD0Q6AEwAA Yong -- Date: Tue, 31 Dec 2013 09:39:58 +0800 Subject: Re: any suggestions on IIS log storage and analysis? From: raofeng...@gmail.com To: user@hadoop.apache.org Thanks, Yong! The dependence never cross files, but since HDFS splits files

any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
Hi, HDFS splits files into blocks, and mapreduce runs a map task for each block. However, Fields could be changed in IIS log files, which means fields in one block may depend on another, and thus make it not suitable for mapreduce job. It seems there should be some preprocess before storing and

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Azuryy Yu
You can run a mapreduce firstly, Join these data sets into one data set. then analyze the joined dataset. On Mon, Dec 30, 2013 at 3:58 PM, Fengyun RAO raofeng...@gmail.com wrote: Hi, HDFS splits files into blocks, and mapreduce runs a map task for each block. However, Fields could be

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
what do you mean by join the data sets? a fake sample log file: #Software: Microsoft Internet Information Services 7.5 #Version: 1.0 #Date: 2013-07-04 20:00:00 #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status

RE: any suggestions on IIS log storage and analysis?

2013-12-30 Thread java8964
, 30 Dec 2013 15:58:57 +0800 Subject: any suggestions on IIS log storage and analysis? From: raofeng...@gmail.com To: user@hadoop.apache.org Hi, HDFS splits files into blocks, and mapreduce runs a map task for each block. However, Fields could be changed in IIS log files, which means fields in one

Re: any suggestions on IIS log storage and analysis?

2013-12-30 Thread Fengyun RAO
several MR jobs together. Yong -- Date: Mon, 30 Dec 2013 15:58:57 +0800 Subject: any suggestions on IIS log storage and analysis? From: raofeng...@gmail.com To: user@hadoop.apache.org Hi, HDFS splits files into blocks, and mapreduce runs a map task for each

RE: any suggestions on IIS log storage and analysis?

2013-12-30 Thread java8964
Google Hadoop WholeFileInputFormat or search it in book Hadoop: The Definitive Guide Yong Date: Tue, 31 Dec 2013 09:39:58 +0800 Subject: Re: any suggestions on IIS log storage and analysis? From: raofeng...@gmail.com To: user@hadoop.apache.org Thanks, Yong! The dependence never cross files