No change in philosophy. I just think "platform for analyzing data"
is too generic. I've talked to a lot of people at a lot of
institutions, and people "get" what a dataflow program is.
-Chris
On Mar 10, 2008, at 3:27 PM, pi song wrote:
I saw a change in Pig Wiki frontpage :-
- [http://incubator.apache.org/pig/ Pig] is a platform for
analyzing large
data sets. Pig's language, Pig Latin, is a simple query algebra
that lets
you express data transformations such as merging data sets,
filtering them,
and applying functions to records or groups of records. Users can
create
their own functions to do special-purpose processing.
+ [http://incubator.apache.org/pig/ Pig] is a dataflow programming
environment for processing very large files. Pig's language is
called Pig
Latin. A Pig Latin program consists of a directed acyclic graph
where each
node represents an operation that transforms data. Operations are
of two
flavors: (1) relational-algebra style operations such as join, filter,
project; (2) functional-programming style operators such as map,
reduce.
Is there any change in philosophy? What is the difference between "a
platform for analyzing large data sets" and "dataflow programming
environment" ? Does the term "data flow programming environment"
imply that
Pig can run across multiple file systems at the same time?
Cheers,
Pi
--
Christopher Olston, Ph.D.
Sr. Research Scientist
Yahoo! Research