[jira] Commented: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850484#action_12850484
 ] 

Hadoop QA commented on PIG-1306:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12439937/PIG-1306.patch
  against trunk revision 928080.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 35 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/252/console

This message is automatically generated.

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch, PIG-1306.patch, PIG-1306.patch, 
 PIG-1306.patch, PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850485#action_12850485
 ] 

Hadoop QA commented on PIG-1331:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12439938/owl.contrib.3.tgz
  against trunk revision 928080.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/253/console

This message is automatically generated.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
 Attachments: owl.contrib.3.tgz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-27 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850638#action_12850638
 ] 

Jay Tang commented on PIG-1331:
---

Owl's data access API, OwlInputFormat, provides a uniform API to access data 
stored in different storage format like Zebra, RCFile, SequenceFile, etc.  Its 
a single data access abstraction on top of disparate data.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
 Attachments: owl.contrib.3.tgz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-27 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850639#action_12850639
 ] 

Carl Steinbach commented on PIG-1331:
-

bq. Owl's data access API, OwlInputFormat, provides a uniform API to access 
data stored in different storage format like Zebra, RCFile, SequenceFile, etc. 
Its a single data access abstraction on top of disparate data.

This sounds like Hive's SerDe interface. Are there any differences?

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
 Attachments: owl.contrib.3.tgz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.