No change in philosophy. I just think "platform for analyzing data" is too generic. I've talked to a lot of people at a lot of institutions, and people "get" what a dataflow program is.

-Chris


On Mar 10, 2008, at 3:27 PM, pi song wrote:

I saw a change in Pig Wiki frontpage :-

- [http://incubator.apache.org/pig/ Pig] is a platform for analyzing large data sets. Pig's language, Pig Latin, is a simple query algebra that lets you express data transformations such as merging data sets, filtering them, and applying functions to records or groups of records. Users can create
their own functions to do special-purpose processing.

+ [http://incubator.apache.org/pig/ Pig] is a dataflow programming
environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two
flavors: (1) relational-algebra style operations such as join, filter,
project; (2) functional-programming style operators such as map, reduce.

Is there any change in philosophy? What is the difference between "a
platform for analyzing large data sets" and "dataflow programming
environment" ? Does the term "data flow programming environment" imply that
Pig can run across multiple file systems at the same time?

Cheers,
Pi

--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research


Reply via email to