Re: How to split log data into different files according to severity

2015-06-14 Thread Hao Wang
Thanks for the link. I’m still running 1.3.1 but will give it a try :)

Hao

> On Jun 13, 2015, at 9:38 AM, Will Briggs  wrote:
> 
> Check out this recent post by Cheng Liam regarding dynamic partitioning in 
> Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html 
> 
> 
> On June 13, 2015, at 5:41 AM, Hao Wang  wrote:
> 
> 
> Hi,
> 
> I have a bunch of large log files on Hadoop. Each line contains a log and its 
> severity. Is there a way that I can use Spark to split the entire data set 
> into different files on Hadoop according the severity field? Thanks. Below is 
> an example of the input and output.
> 
> Input:
> [ERROR] log1
> [INFO] log2
> [ERROR] log3
> [INFO] log4
> 
> Output:
> error_file
> [ERROR] log1
> [ERROR] log3
> 
> info_file
> [INFO] log2
> [INFO] log4
> 
> 
> Best,
> Hao Wang



Re: How to split log data into different files according to severity

2015-06-13 Thread Will Briggs
Check out this recent post by Cheng Liam regarding dynamic partitioning in 
Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html

On June 13, 2015, at 5:41 AM, Hao Wang  wrote:

Hi,


I have a bunch of large log files on Hadoop. Each line contains a log and its 
severity. Is there a way that I can use Spark to split the entire data set into 
different files on Hadoop according the severity field? Thanks. Below is an 
example of the input and output.


Input:

[ERROR] log1

[INFO] log2

[ERROR] log3

[INFO] log4


Output:

error_file

[ERROR] log1

[ERROR] log3


info_file

[INFO] log2

[INFO] log4



Best,

Hao Wang



Re: How to split log data into different files according to severity

2015-06-13 Thread Hao Wang
I am currently using filter inside a loop of all severity levels to do this, 
which I think is pretty inefficient. It has to read the entire data set once 
for each severity. I wonder if there is a more efficient way that takes just 
one pass of the data? Thanks.

Best,
Hao Wang

> On Jun 13, 2015, at 3:48 AM, Akhil Das  wrote:
> 
> Are you looking for something like filter? See a similar example here 
> https://spark.apache.org/examples.html 
> 
> 
> Thanks
> Best Regards
> 
> On Sat, Jun 13, 2015 at 3:11 PM, Hao Wang  > wrote:
> Hi,
> 
> I have a bunch of large log files on Hadoop. Each line contains a log and its 
> severity. Is there a way that I can use Spark to split the entire data set 
> into different files on Hadoop according the severity field? Thanks. Below is 
> an example of the input and output.
> 
> Input:
> [ERROR] log1
> [INFO] log2
> [ERROR] log3
> [INFO] log4
> 
> Output:
> error_file
> [ERROR] log1
> [ERROR] log3
> 
> info_file
> [INFO] log2
> [INFO] log4
> 
> 
> Best,
> Hao Wang
> 



Re: How to split log data into different files according to severity

2015-06-13 Thread Akhil Das
Are you looking for something like filter? See a similar example here
https://spark.apache.org/examples.html

Thanks
Best Regards

On Sat, Jun 13, 2015 at 3:11 PM, Hao Wang  wrote:

> Hi,
>
> I have a bunch of large log files on Hadoop. Each line contains a log and
> its severity. Is there a way that I can use Spark to split the entire data
> set into different files on Hadoop according the severity field? Thanks.
> Below is an example of the input and output.
>
> Input:
> [ERROR] log1
> [INFO] log2
> [ERROR] log3
> [INFO] log4
>
> Output:
> error_file
> [ERROR] log1
> [ERROR] log3
>
> info_file
> [INFO] log2
> [INFO] log4
>
>
> Best,
> Hao Wang
>