Re: possible use of Pig for OLAP

Chris Olston Tue, 20 Nov 2007 09:29:43 -0800

Sounds interesting. Pig is geared toward large-scale aggregationoperations, in the style of OLAP.


Regarding your 3rd paragraph question, do you mean:

a) there are several interrelated aggregation expressions that youwant evaluated in just one pass over the data, orb) you do some initial aggregation, display it to the user, who cando "drill-down" operations in the GUI which require you to look upmore data in the backend

For (a), yes Pig can do that, although currently you have to encodeit explicitly as a single Pig program (in future versions, we mightbe able to take multiple related Pig programs and execute them in ajoint fashion). For (b), we don't currently have a mechanism to dothat without reloading the data, although perhaps the operatingsystem's file cache would help with that, under the covers, if thefile partitions fit in memory and don't get evicted.


-Chris


On Nov 20, 2007, at 1:47 AM, Alexandru Toth wrote:

Hi,

I am developing an Open Source OLAP application called "Cubulus". The
code is at http://sourceforge.net/projects/cubulus/ , a brief
presentation material at http://cubulus.sourceforge.net/ , and an
online demo at: http://alxtoth.webfactional.com

It would be interresting to use Pig instead of relational databasesas backend.


The question is: can Pig scripts work is such manner that the file is
loaded only once, and then subsequent web requests process over and
over the same file? This becomes relevant if the data file is large,
and there is one datafile to process (or few datafiles). In fact, is
repated loading a problem at all :-) ?

-Alex


--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research

Re: possible use of Pig for OLAP

Reply via email to