[Pig Wiki] Update of "owl" by jaytang

Apache Wiki Mon, 22 Mar 2010 15:15:21 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The "owl" page has been changed by jaytang.
http://wiki.apache.org/pig/owl?action=diff&rev1=3&rev2=4

--------------------------------------------------

  
  = Apache Owl Wiki =
  
- The goal of Owl (a.k.a Hadoop metadata system) is to allow users and 
applications to register data stored on HDFS, search for the data available on 
HDFS, and associate metadata such as schema, statistics, etc. with a particular 
data unit or a data set stored on HDFS. The initial goal is to provide a fairly 
generic, low level abstraction that any user or application on HDFS can use to 
store an retrieve metadata. Over time a higher level abstractions closely tied 
to particular applications or tools can be developed.
+ The goal of Owl is to provide a high level data management abstraction than 
that provided by HDFS directories and files.  Applications written in MapReduce 
and Pig scripts must deal with low data data management issues such as storage 
format, serialization/compression schemes, data layout, and efficient data 
access paths, often with different solutions. Owl attempts to provide a 
standard way to addresses this issue.
  
- Please refer to this document for more detailed 
[[http://wiki.apache.org/pig/Metadata|use case, architecture, data model]]
+ Owl supports the notion of "Owl Tables", a basic unit of data management.  An 
Owl Table has these characteristics:
+ 
+    * lives in an Owl database name space and could contain multiple partitions
+    * has columns and rows and supports a unified table level schema
+    * supports MapReduce and Pig Latin and potentially other languages
+    * designed for batch read/write operations
+    * supports external tables (data already exists on file system)
+    * pluggable architecture for different storage format such as Zebra
+    * presents a logically partitioned view of data organization
+    * efficient data access mechanisms via partition and projection pruning
+ 
+ 
+ Owl supports two major public APIs.  "Owl Driver" provides management APIs 
against "Owl Table", "Owl Database", and "Partition".  This API is backed up by 
an internal Owl metadata store that runs on Tomcat and a relational database.  
"OwlInputFormat" provides a data access API and is modeled after the 
traditional Hadoop InputFormat.  In the future, we plan to support 
"OwlOutputFormat" and thus the notion of "Owl Managed Table" where Owl controls 
the data flow into and out of "Owl Tables".  Owl supports Pig integration with 
OwlPigLoader/Storer module.
  
  
  == Prerequisite ==
  
- Owl high no dependency on the release of Hadoop and Pig
+ Owl depends on Pig for its tuple classes as a basic unit of data container, 
and Hadoop 20 for "OwlInputFormat".  Owl supports Zebra integration out of the 
box.
  
  == Getting Owl ==
  
@@ -26, +38 @@

     * JDK 1.6
     * Ant 1.7.1
     * download [[http://dev.mysql.com/downloads/connector/j/5.1.html|MySQL 5.1 
JDBC driver]]
+    * Oracle
  
  How to compile
  
     * check out latest PIG trunk
+    * compile Pig
     * cd contrib/owl
     * copy MySQL JDBC driver to contrib/owl/java/lib directory
     * ant war (build owl web application)
@@ -40, +54 @@

  For development environment, Owl supports jetty 7.0 (with jetty-runner) and 
derby 10.5.  For production deployment, Owl supports:
  
     * Tomcat 6.0
-    * MySQL 5.1
+    * MySQL 5.1 or Oracle 11g
  
  After installing Tomcat and MySQL, you will need these files:
  
-    * owl.war - owl web application
+    * owl-<0.x.x>.war - owl web application
+    * owl-<0.x.x>.jar - owl client library OwlInputFormat and OwlDriver with 
all their dependent 3rd party libs
     * mysql_schema.sql - owl database schema file at contrib/owl/setup/mysql
     * owlServerConfig.xml - owl server configuration file at 
contrib/owl/setup/mysql
  
@@ -58, +73 @@

  
  == Sample Code == 
  
- Owl comes with a Java-based client.  Client API Javadoc is at: 
+ Owl comes with a Java-based client.  Client API Javadoc is at...  These two 
key packages contain the public APIs for Owl's main features: 
"org.apache.hadoop.owl.client" and "org.apache.hadoop.owl.mapreduce"
  
  Sample code is attached to write a client application against owl:

[Pig Wiki] Update of "owl" by jaytang

Reply via email to