Re: possible use of Pig for OLAP

Andrzej Bialecki Tue, 20 Nov 2007 10:31:36 -0800

Chris Olston wrote:

Sounds interesting. Pig is geared toward large-scale aggregationoperations, in the style of OLAP.
Regarding your 3rd paragraph question, do you mean:
a) there are several interrelated aggregation expressions that you wantevaluated in just one pass over the data, orb) you do some initial aggregation, display it to the user, who can do"drill-down" operations in the GUI which require you to look up moredata in the backend
?
For (a), yes Pig can do that, although currently you have to encode itexplicitly as a single Pig program (in future versions, we might be ableto take multiple related Pig programs and execute them in a jointfashion). For (b), we don't currently have a mechanism to do thatwithout reloading the data, although perhaps the operating system's filecache would help with that, under the covers, if the file partitions fitin memory and don't get evicted.

Would it be possible to modify Pig (and underlying local/mapreduce impl)so that if a specific syntax is used then an intermediate result is alsostored into a temporary file? This way, on the first dump/store Pigwould produce all intermediate results, then keep some of them, andre-use them for subsequent operators?

Example - let's say that ':=' means that the result should be keptaround until exit (or until any of previous intermediate results changes):


-- A is not persisted
A = load 'sample.txt' as (date, time, ip, query);
-- B is to be persisted in a temp file
B := group A by ip;
-- compile & execute - creates B in a temp file
dump B;
C = foreach B generate group, query;
-- this uses already existing B data from a temp file
dump C;


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: possible use of Pig for OLAP

Reply via email to