Re: Advanced log processing

2014-05-20 Thread Laurent T
Thanks for the advice. I think you're right. I'm not sure we're going to use HBase but starting by partitioning data into multiple buckets will be a first step. I'll see how it performs on large datasets. My original question though was more like: is there a spark trick i don't know about ? Curren

Re: Advanced log processing

2014-05-19 Thread Mayur Rustagi
It seems you are not reducing the data in size. If you are not then you are better off partitioning the data into buckets (folders?) & keep data sorted in those buckets .. A more cleaner approach is to use HBase to keep track of keys & keep adding keys as you find them & let hbase handle it. Mayur

Re: Advanced log processing

2014-05-19 Thread Laurent T
(resending this as alot of mails seems not to be delivered) Hi, I have some complex behavior i'd like to be advised on as i'm really new to Spark. I'm reading some log files that contains various events. There are two types of events: parents and children. A child event can only have one pare