Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "owl" page has been changed by jaytang. http://wiki.apache.org/pig/owl?action=diff&rev1=5&rev2=6 -------------------------------------------------- = Apache Owl Wiki = - The goal of Owl is to provide a high level data management abstraction than that provided by HDFS directories and files. Applications written in !MapReduce and Pig scripts must deal with low data data management issues such as storage format, serialization/compression schemes, data layout, and efficient data access paths, often with different solutions. Owl attempts to provide a standard way to addresses this issue. + The goal of Owl is to provide a high level data management abstraction. !MapReduce and Pig applications interacting directly with HDFS directories and files must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. - Owl supports the notion of "Owl Tables", a basic unit of data management. An Owl Table has these characteristics: + Owl provides a tabular view of data on Hadoop and thus supports the notion of ''Owl Tables'', a basic unit of data management. An Owl Table has these characteristics: * lives in an Owl database name space and could contain multiple partitions * has columns and rows and supports a unified table level schema - * supports !MapReduce and Pig Latin and potentially other languages + * interface to supports !MapReduce and Pig Latin and can easily work with other languages - * designed for batch read/write operations + * designed for efficient batch read/write operations * supports external tables (data already exists on file system) * pluggable architecture for different storage format such as Zebra - * presents a logically partitioned view of data organization + * presents a logically partitioned view of data and supports very large data set via its multi-level flexible partitioning scheme * efficient data access mechanisms via partition and projection pruning - Owl supports two major public APIs. Owl Driver provides management APIs against "Owl Table", "Owl Database", and "Partition". This API is backed up by an internal Owl metadata store that runs on Tomcat and a relational database. !OwlInputFormat provides a data access API and is modeled after the traditional Hadoop !InputFormat. In the future, we plan to support !OwlOutputFormat and thus the notion of "Owl Managed Table" where Owl controls the data flow into and out of "Owl Tables". Owl supports Pig integration with !OwlPigLoader/Storer module. + Owl has two major public APIs. ''Owl Driver'' provides management APIs against "Owl Table", "Owl Database", and "Partition". This API is backed up by an internal Owl metadata store that runs on Tomcat and a relational database. ''!OwlInputFormat'' provides a data access API and is modeled after the traditional Hadoop !InputFormat. In the future, we plan to support ''!OwlOutputFormat'' and thus the notion of "Owl Managed Table" where Owl controls the data flow into and out of "Owl Tables". Owl also supports Pig integration with !OwlPigLoader/Storer module. == Prerequisite == - Owl depends on Pig for its tuple classes as a basic unit of data container, and Hadoop 20 for !OwlInputFormat. + Owl depends on Pig for its tuple classes as its basic unit of data container, and Hadoop 20 for !OwlInputFormat. Its first release will require Pig 0.7 and Hadoop 20.2. Owl also requires a storage driver; out-of-the-box Owl integrates with Zebra 0.7. == Getting Owl == @@ -45, +45 @@ * check out latest PIG trunk * compile Pig * cd contrib/owl - * copy MySQL JDBC driver to contrib/owl/java/lib directory + * copy MySQL(or oracle) JDBC driver to contrib/owl/java/lib directory + * ant jar (buid owl driver jar file) * ant war (build owl web application) * ant test (run owl unit test using jetty and derby without any setup steps) @@ -59, +60 @@ After installing Tomcat and MySQL, you will need these files: * owl-<0.x.x>.war - owl web application - * owl-<0.x.x>.jar - owl client library OwlInputFormat and OwlDriver with all their dependent 3rd party libs + * owl-<0.x.x>.jar - owl client library ''!OwlInputFormat'' and ''!OwlDriver'' with all their dependent 3rd party libraries + * mysql - * mysql_schema.sql - owl database schema file at contrib/owl/setup/mysql + * mysql_schema.sql - owl database schema file at contrib/owl/setup/mysql - * owlServerConfig.xml - owl server configuration file at contrib/owl/setup/mysql + * owlServerConfig.xml - owl server configuration file at contrib/owl/setup/mysql - + * oracle + * oracle_schema.sql - owl database schema file at contrib/owl/setup/oracle + * owlServerConfig.xml - owl server configuration file at contrib/owl/setup/oracle Set up parameters in owlServerConfig:
