Re: Newbie questions regarding log processing

2016-02-22 Thread Philippe de Rochambeau
Thank you to you both, Jorge and Mich. You've answered my questions in a quasi-realtime manner! I will look into Flume and HDFS. > Le 22 févr. 2016 à 22:41, Jorge Machado a écrit : > > To Get the that you could use Flume to ship the logs from the Servers to the > HDFS for

Re: Newbie questions regarding log processing

2016-02-22 Thread Teng Qiu
woow, great post, very detailed, question is that, what kind of "web logs" do they have, if those logs are some application logs, like apache httpd logs or oracle logs, then, sure, this is a typical use cases for spark or generally, for hadoop tech stack. but if Philippe is talking about network

Re: Newbie questions regarding log processing

2016-02-22 Thread Jorge Machado
To Get the that you could use Flume to ship the logs from the Servers to the HDFS for example and to streaming on it. Check this : http://spark.apache.org/docs/latest/streaming-flume-integration.html and

Re: Newbie questions regarding log processing

2016-02-22 Thread Mich Talebzadeh
Hi, There are a number of options here. You first point of call would be to store these logs that come in from the source on HDFS directory as time series entries. I assume the logs will be in textual format and will be compressed (gzip, bzip2 etc).They can be stored individually and you

Newbie questions regarding log processing

2016-02-22 Thread Philippe de Rochambeau
Hello, I have a few newbie questions regarding Spark. Is Spark a good tool to process Web logs for attacks (or is it better to used a more specialized tool)? If so, are there any plugins for this purpose? Can you use Spark to weed out huge logs and extract only suspicious activities; e.g., 1000