[Pig Wiki] Update of "owl" by jaytang

Apache Wiki Thu, 01 Apr 2010 14:59:29 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The "owl" page has been changed by jaytang.
http://wiki.apache.org/pig/owl?action=diff&rev1=12&rev2=13

--------------------------------------------------

  The core M/R programming interface as we know it (the mapper, reducer, output 
collector, record reader and input format ) all deal with collection of 
abstract data objects, not files. However, the current set of !InputFormat 
implementations provided by job API are relatively primitive and are heavily 
coupled to file formats and HDFS paths to describe input and output locations. 
From an application programmer’s perspective, one has to think about both the 
abstract data and the physical representation and storage location, which is a 
disconnect from the abstract data API. In the meantime, the number of file 
formats and (de)serialization libraries have flourished in the Hadoop 
community. Some of these require certain metadata to operate/optimize. While 
providing optimization and performance enhancements, these file formats and 
SerDe libs don’t make it any easier to develop applications on and manage very 
big data sets. 
  
  
- == High Level Diagram == 
+ == High Level Diagram ==
  
  As one can see, Owl gives Hadoop users a uniform interface for organizing, 
discovering and managing data stored in many different formats, and to promote 
interoperability among different programming frameworks. Owl presents a single 
logical view of data organization and hides the complexity and evolutions in 
underlying physical data layout schemes. It gives Hadoop applications a stable 
foundation to build upon. 
  
@@ -34, +34 @@

  || Owl has support for converting data between write-friendly and 
read-friendly formats || future ||
  || Owl has support for addressing HDFS NameNode limitations by decreasing the 
number of files needed to store very large data sets || future ||
  || Owl provides a security model for secure data access || future ||
- 
  
  == Prerequisite ==
  
@@ -102, +101 @@

      * deploy owl war file to Tomcat
      * set up -Dorg.apache.hadoop.owl.xmlconfig=<full path to 
owlServerConfig.xml> for the Tomcat deployment
  
- == Developing on Owl == 
+ == Developing on Owl ==
  
  Owl has two major public APIs.  ''Owl Driver'' provides management APIs 
against three core Owl abstractions: "Owl Table", "Owl Database", and 
"Partition".  This API is backed up by an internal Owl metadata store that runs 
on Tomcat and a relational database.  ''!OwlInputFormat'' provides a data 
access API and is modeled after the traditional Hadoop !InputFormat.  In the 
future, we plan to support ''!OwlOutputFormat'' and thus the notion of "Owl 
Managed Table" where Owl controls the data flow into and out of "Owl Tables".  
Owl also supports Pig integration with OwlPigLoader/Storer module.

[Pig Wiki] Update of "owl" by jaytang

Reply via email to