[Pig Wiki] Update of "owl" by jaytang

Apache Wiki Fri, 26 Mar 2010 11:00:57 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The "owl" page has been changed by jaytang.
http://wiki.apache.org/pig/owl?action=diff&rev1=9&rev2=10

--------------------------------------------------

  
  The goal of Owl is to provide a high level data management abstraction.  
!MapReduce and Pig applications interacting directly with HDFS directories and 
files must deal with low level data management issues such as storage format, 
serialization/compression schemes, data layout, and efficient data accesses, 
etc, often with different solutions. Owl aims to provide a standard way to 
addresses this issue and abstracts away the complexities of reading/writing 
huge amount of data from/to HDFS.
  
- Owl provides a tabular view of data on Hadoop and thus supports the notion of 
''Owl Tables'', a basic unit of data management.  An Owl Table has these 
characteristics:
+ Owl provides a tabular view of data on Hadoop and thus supports the notion of 
''Owl Tables''.  Conceptually, it is similar to a relation database table.  An 
Owl Table has these characteristics:
  
     * lives in an Owl database name space and could contain multiple partitions
     * has columns and rows and supports a unified table level schema
     * interface to supports !MapReduce and Pig Latin and can easily work with 
other languages
-    * designed for efficient batch read/write operations
+    * designed for efficient batch read/write operations, partitions can be 
added or removed from a table
     * supports external tables (data already exists on file system)
     * pluggable architecture for different storage format such as Zebra
     * presents a logically partitioned view of data and supports very large 
data set via its multi-level flexible partitioning scheme
-    * efficient data access mechanisms via partition and projection pruning
+    * efficient data access mechanisms over very large data set via partition 
and projection pruning
  
  
- Owl has two major public APIs.  ''Owl Driver'' provides management APIs 
against "Owl Table", "Owl Database", and "Partition".  This API is backed up by 
an internal Owl metadata store that runs on Tomcat and a relational database.  
''!OwlInputFormat'' provides a data access API and is modeled after the 
traditional Hadoop !InputFormat.  In the future, we plan to support 
''!OwlOutputFormat'' and thus the notion of "Owl Managed Table" where Owl 
controls the data flow into and out of "Owl Tables".  Owl also supports Pig 
integration with !OwlPigLoader/Storer module.
+ Owl has two major public APIs.  ''Owl Driver'' provides management APIs 
against three core Owl abstractions: "Owl Table", "Owl Database", and 
"Partition".  This API is backed up by an internal Owl metadata store that runs 
on Tomcat and a relational database.  ''!OwlInputFormat'' provides a data 
access API and is modeled after the traditional Hadoop !InputFormat.  In the 
future, we plan to support ''!OwlOutputFormat'' and thus the notion of "Owl 
Managed Table" where Owl controls the data flow into and out of "Owl Tables".  
Owl also supports Pig integration with OwlPigLoader/Storer module.
  
  Initially, we like to open source Owl as a Pig contrib project.  In the long 
term, Owl could become a separate Hadoop subproject as it provides a platform 
service all Hadoop applications.

[Pig Wiki] Update of "owl" by jaytang

Reply via email to