[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-08-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741733#action_12741733
 ] 

Ashutosh Chauhan commented on PIG-845:
--

Hi Dmitriy,

Thanks for review. Please find my comments inline.

1.
EndOfAllInput flags - could you add comments here about what the point of this 
flag is? You explain what EndOfAllInputSetter does (which is actually rather 
self-explanatory) but not what the meaning of the flag is and how it's used. 
There is a bit of an explanation in PigMapBase, but it really belongs here.
 EndofAllInput flag is basically a flag to indicate that on close() call of 
 map/reduce task, run the pipeline once more. Till now it was used only by 
 POStream, but now POMergeJoin also make use of it.

2.
Could you explain the relationship between EndOfAllInput and (deleted) POStream?
 POStream is still there, I guess you are referring to MRStreamHandler which 
 is deleted. Its renaming of class. Now that POMergeJoin also makes use of 
 it, its better to give it a generic name like EndOfAllInput instead of 
 MRStreamHandler.

3.
Comments in MRCompiler alternate between referring to the left MROp as 
LeftMROper and curMROper. Choose one.
 Ya, will update the comments.

4.
I am curious about the decision to throw compiler exceptions if MergeJoin 
requirements re number of inputs, etc, aren't satisfied. It seems like a better 
user experience would be to log a warning and fall back to a regular join.
 Ya, a good suggestion. It would be straight forward to do it while parsing 
 (e.g. when there are more then two inputs). Though its not straight forward 
 to do at logical to physical plan and physical to MRJobs translation time. 

5.
Style notes for visitMergeJoin:

It's a 200-line method. Any way you can break it up into smaller components? As 
is, it's hard to follow.
 I can break it up, but that will bloat the MRCompiler class size. Better 
 idea is to have MRCompilerHelper or some such class where all the low level 
 helper function lives, so that MRCompiler itself is small and thus easier to 
 read. 

The if statements should be broken up into multiple lines to agree with the 
style guides.

Variable naming: you've got topPrj, prj, pkg, lr, ce, nig.. one at a time they 
are fine, but together in a 200-line method they are undreadable. Please 
consider more descriptive names.
 Will use more descriptive names in next patch.

6.
Kind of a global comment, since it applies to more than just MergeJoin:

It seems to me like we need a Builder for operators to clean up some of the 
new, set, set, set stuff.

Having the setters return this and a Plan's add() method return the plan, would 
let us replace this:

POProject topPrj = new POProject(new 
OperatorKey(scope,nig.getNextNodeId(scope)));
topPrj.setColumn(1);
topPrj.setResultType(DataType.TUPLE);
topPrj.setOverloaded(true);
rightMROpr.reducePlan.add(topPrj);
rightMROpr.reducePlan.connect(pkg, topPrj);

with this:

POProject topPrj = new POProject(new 
OperatorKey(scope,nig.getNextNodeId(scope)))
.setColumn(1).setResultType(DataType.TUPLE)
.setOverloaded(true);

rightMROpr.reducePlan.add(topPrj).connect(pkg, topPrj)

I agree. At many places there are too many parameters to set. Setters should 
be smart and should return the object instead of being void and then this 
idea of chaining will help to cut down the number of lines. 

7.
Is the change to ListListByte keyTypes in POFRJoin related to MergeJoin or 
just rolled in?
POFRJoin can do without this change, but to avoid code duplication, I update 
the POFRJoin to use ListListByte keyTypes.

8. MergeJoin

break getNext() into components.
 I dont want to do that because it already has lots of class members which 
 are getting updated at various places. Making those variables live in 
 multiple functions will make logic even more harder to follow. Also, I am 
 not sure if java compiler can always inline the private methods.

I don't see you supporting Left outer joins. Plans for that? At least document 
the planned approach.
 Ya, outer joins are currently not supported. Its documented in 
 specification. Will include comment in code also.

Error codes being declared deep inside classes, and documented on the wiki, is 
a poor practice, imo. They should be pulled out into PigErrors (as lightweight 
final objects that have an error code, a name, and a description..) I thought 
Santhosh made progress on this already, no?
 Not sure if I understand you completely. I am using ExecException, 
 FrontEndException etc. Arent these are lightweight final objects you are 
 referring to ?

Could you explain the problem with splits and streams? Why can't this work for 
them?
 Streaming after the join will be supported. There was a bug which I fixed 
 and will be a part of next patch. Streaming before Join will not be 
 supported because in endOfAllInput case, streaming may potentially produce 
 multiple tuples 

Hudson build is back to normal: Pig-trunk #519

2009-08-11 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/519/




[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Status: Open  (was: Patch Available)

 support cast of chararray to other simple types
 ---

 Key: PIG-893
 URL: https://issues.apache.org/jira/browse/PIG-893
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Jeff Zhang
 Fix For: 0.4.0


 Pig should support casting of chararray to 
 integer,long,float,double,bytearray. If the conversion fails for reasons such 
 as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Attachment: (was: Pig_893.Patch)

 support cast of chararray to other simple types
 ---

 Key: PIG-893
 URL: https://issues.apache.org/jira/browse/PIG-893
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Jeff Zhang
 Fix For: 0.4.0


 Pig should support casting of chararray to 
 integer,long,float,double,bytearray. If the conversion fails for reasons such 
 as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Status: Patch Available  (was: Open)

 support cast of chararray to other simple types
 ---

 Key: PIG-893
 URL: https://issues.apache.org/jira/browse/PIG-893
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Jeff Zhang
 Fix For: 0.4.0

 Attachments: Pig_893.Patch


 Pig should support casting of chararray to 
 integer,long,float,double,bytearray. If the conversion fails for reasons such 
 as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Attachment: Pig_893.Patch

Updated the patch.
1. Add license header. (for audit warning)
2. Change new Long(long)  to Long.valueOf(long) for findbug warning

 support cast of chararray to other simple types
 ---

 Key: PIG-893
 URL: https://issues.apache.org/jira/browse/PIG-893
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Jeff Zhang
 Fix For: 0.4.0

 Attachments: Pig_893.Patch


 Pig should support casting of chararray to 
 integer,long,float,double,bytearray. If the conversion fails for reasons such 
 as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-Patch-minerva.apache.org #156

2009-08-11 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/




[jira] Created: (PIG-915) Pig HBase

2009-08-11 Thread Alex Newman (JIRA)
Pig HBase
-

 Key: PIG-915
 URL: https://issues.apache.org/jira/browse/PIG-915
 Project: Pig
  Issue Type: Improvement
Reporter: Alex Newman
Priority: Minor


Currently their is no way to get the Row names when doing a query from HBase, 
we should probably remedy this as important data may be stored there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-914) Change the PIG hbase interface to use bytes along with strings

2009-08-11 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741997#action_12741997
 ] 

Alex Newman commented on PIG-914:
-

Someone should assign this to me.

 Change the PIG hbase interface to use bytes along with strings
 --

 Key: PIG-914
 URL: https://issues.apache.org/jira/browse/PIG-914
 Project: Pig
  Issue Type: Improvement
Reporter: Alex Newman
Priority: Minor

 Currently start rows, tablenames, column names are all strings, and HBase 
 supports bytes we might want to change the Pig interface to support bytes 
 along with strings.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning

2009-08-11 Thread Alex Newman (JIRA)
Change the pig hbase interface to get more than one row at a time when scanning
---

 Key: PIG-916
 URL: https://issues.apache.org/jira/browse/PIG-916
 Project: Pig
  Issue Type: Improvement
Reporter: Alex Newman
Priority: Trivial


It should be significantly faster to get numerous rows at the same time rather 
than one row at a time for large table extraction processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning

2009-08-11 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742008#action_12742008
 ] 

Alex Newman commented on PIG-916:
-

Feel free to assign this to me.

 Change the pig hbase interface to get more than one row at a time when 
 scanning
 ---

 Key: PIG-916
 URL: https://issues.apache.org/jira/browse/PIG-916
 Project: Pig
  Issue Type: Improvement
Reporter: Alex Newman
Priority: Trivial

 It should be significantly faster to get numerous rows at the same time 
 rather than one row at a time for large table extraction processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

Updated patch. Only change is that ant prints a descriptive error to user if 
hadoop20.jar does not exist in top level lib directory. It lists basic steps to 
get this built until PIG-660 is committed.


 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742069#action_12742069
 ] 

Raghu Angadi commented on PIG-833:
--

Alan, in order to run unit tests you need to build pig test-core.

As mentioned in the instructions above please run {{'ant -Dtestcase=none 
test-core'}} under top level directory before running 'ant test' under 
contrib/zebra.


 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Attachment: sampler.patch

The attached file has the redesigned sampler interface. Skewed join now uses a 
trivial implementation of the poisson sampling mechanism.

 Create a sampler interface and improve the skewed join sampler
 --

 Key: PIG-890
 URL: https://issues.apache.org/jira/browse/PIG-890
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: sampler.patch


 We need a different sampler for order by and skewed join. We thus need a 
 better sampling interface. The design of the same is described here: 
 http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Status: Patch Available  (was: Open)

 Create a sampler interface and improve the skewed join sampler
 --

 Key: PIG-890
 URL: https://issues.apache.org/jira/browse/PIG-890
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: sampler.patch


 We need a different sampler for order by and skewed join. We thus need a 
 better sampling interface. The design of the same is described here: 
 http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-833:
---

Attachment: TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt

Okay, now that I've first built Pig's test, I run the tests and I get:

{code}
 [delete] Deleting directory 
/Users/gates/src/pig/apache/top/zebra/trunk/build/contrib/zebra/test/logs
[mkdir] Created dir: 
/Users/gates/src/pig/apache/top/zebra/trunk/build/contrib/zebra/test/logs
[junit] Running org.apache.hadoop.zebra.io.TestCheckin
[junit] Tests run: 125, Failures: 0, Errors: 0, Time elapsed: 16.894 sec
[junit] Running org.apache.hadoop.zebra.mapred.TestCheckin
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 158.741 sec
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin1
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.13 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin1 FAILED
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin2
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.131 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin2 FAILED
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin3
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.133 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin3 FAILED
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin4
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.128 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin4 FAILED
[junit] Running org.apache.hadoop.zebra.pig.TestCheckin5
[junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.128 sec
[junit] Test org.apache.hadoop.zebra.pig.TestCheckin5 FAILED
[junit] Running org.apache.hadoop.zebra.types.TestCheckin
[junit] Tests run: 45, Failures: 0, Errors: 0, Time elapsed: 0.253 sec
{code}

I've attached the output from one of the tests.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742083#action_12742083
 ] 

Dmitriy V. Ryaboy commented on PIG-833:
---

Alan -- if it's not finding .dfs , it's probably not linking hadoop20.jar

Try my patch in 660 :-)

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742093#action_12742093
 ] 

Alan Gates commented on PIG-833:


My bad.  I missed the line in the instructions where it said to apply the 
PIG-660 patch.  I applied that and am trying again.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742100#action_12742100
 ] 

Alan Gates commented on PIG-833:


Patch checked in.  All the unit tests passed.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-913:
---

Status: Patch Available  (was: Open)

 Error in Pig script when grouping on chararray column
 -

 Key: PIG-913
 URL: https://issues.apache.org/jira/browse/PIG-913
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Priority: Critical
 Fix For: 0.4.0

 Attachments: PIG-913.patch


 I have a very simple script which fails at parsetime due to the schema I 
 specified in the loader.
 {code}
 data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
 dataSmall = limit data 100;
 bb = GROUP dataSmall by $0;
 dump bb;
 {code}
 =
 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pig_1249609676296.log
 2009-08-06 18:47:56,459 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
 file system at: hdfs://localhost:9000
 2009-08-06 18:47:56,694 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
 map-reduce job tracker at: localhost:9001
 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias bb
 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
 Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
 =
 =
 Pig Stack Trace
 ---
 ERROR 1002: Unable to store alias bb
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias bb
 at org.apache.pig.PigServer.openIterator(PigServer.java:481)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
 Unable to store alias bb
 at org.apache.pig.PigServer.store(PigServer.java:536)
 at org.apache.pig.PigServer.openIterator(PigServer.java:464)
 ... 6 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
 at 
 org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
 at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
 at 
 org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
 at org.apache.pig.PigServer.compileLp(PigServer.java:854)
 at org.apache.pig.PigServer.compileLp(PigServer.java:791)
 at org.apache.pig.PigServer.store(PigServer.java:509)
 ... 7 more
 =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #157

2009-08-11 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/

--
[...truncated 103063 lines...]
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: Received block 
blk_-6509224781215538639_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40940 is added to 
blk_-6509224781215538639_1011 size 6
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: PacketResponder 1 
for block blk_-6509224781215538639_1011 terminating
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:38934 is added to 
blk_-6509224781215538639_1011 size 6
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: Received block 
blk_-6509224781215538639_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: PacketResponder 2 
for block blk_-6509224781215538639_1011 terminating
 [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56715 is added to 
blk_-6509224781215538639_1011 size 6
 [exec] [junit] 09/08/11 23:36:15 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:40772
 [exec] [junit] 09/08/11 23:36:15 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:42304
 [exec] [junit] 09/08/11 23:36:15 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/11 23:36:15 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/11 23:36:16 WARN dfs.DataNode: Unexpected error 
trying to delete block blk_-7801099502017534561_1004. BlockInfo not found in 
volumeMap.
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Deleting block 
blk_-7252209396593481868_1006 file 
dfs/data/data7/current/blk_-7252209396593481868
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Deleting block 
blk_-1800239565210147527_1005 file 
dfs/data/data8/current/blk_-1800239565210147527
 [exec] [junit] 09/08/11 23:36:16 WARN dfs.DataNode: 
java.io.IOException: Error in deleting blocks.
 [exec] [junit] at 
org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
 [exec] [junit] at java.lang.Thread.run(Thread.java:619)
 [exec] [junit] 
 [exec] [junit] 09/08/11 23:36:16 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/11 23:36:16 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908112335_0002/job.jar. 
blk_5812011963372313027_1012
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block 
blk_5812011963372313027_1012 src: /127.0.0.1:56518 dest: /127.0.0.1:37446
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block 
blk_5812011963372313027_1012 src: /127.0.0.1:53963 dest: /127.0.0.1:40940
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block 
blk_5812011963372313027_1012 src: /127.0.0.1:36671 dest: /127.0.0.1:56715
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block 
blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 0 
for block blk_5812011963372313027_1012 terminating
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block 
blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 1 
for block blk_5812011963372313027_1012 terminating
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56715 is added to 
blk_5812011963372313027_1012 size 1480752
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block 
blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40940 is added to 
blk_5812011963372313027_1012 size 1480752
 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 2 
for block blk_5812011963372313027_1012 terminating
 [exec] [junit] 09/08/11 23:36:16 

[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742118#action_12742118
 ] 

Hadoop QA commented on PIG-890:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416250/sampler.patch
  against trunk revision 801865.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 6 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/console

This message is automatically generated.

 Create a sampler interface and improve the skewed join sampler
 --

 Key: PIG-890
 URL: https://issues.apache.org/jira/browse/PIG-890
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: sampler.patch


 We need a different sampler for order by and skewed join. We thus need a 
 better sampling interface. The design of the same is described here: 
 http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Status: Open  (was: Patch Available)

 Create a sampler interface and improve the skewed join sampler
 --

 Key: PIG-890
 URL: https://issues.apache.org/jira/browse/PIG-890
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: sampler.patch


 We need a different sampler for order by and skewed join. We thus need a 
 better sampling interface. The design of the same is described here: 
 http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-890:


Attachment: (was: sampler.patch)

 Create a sampler interface and improve the skewed join sampler
 --

 Key: PIG-890
 URL: https://issues.apache.org/jira/browse/PIG-890
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: sampler.patch


 We need a different sampler for order by and skewed join. We thus need a 
 better sampling interface. The design of the same is described here: 
 http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742136#action_12742136
 ] 

Sriranjan Manjunath commented on PIG-890:
-

Let me know if you think that this requires a test case and I will be happy to 
include it.

 Create a sampler interface and improve the skewed join sampler
 --

 Key: PIG-890
 URL: https://issues.apache.org/jira/browse/PIG-890
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: sampler.patch


 We need a different sampler for order by and skewed join. We thus need a 
 better sampling interface. The design of the same is described here: 
 http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-907) Provide multiple version of HashFNV (Piggybank)

2009-08-11 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742137#action_12742137
 ] 

Olga Natkovich commented on PIG-907:


+1

 Provide multiple version of HashFNV (Piggybank)
 ---

 Key: PIG-907
 URL: https://issues.apache.org/jira/browse/PIG-907
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Priority: Minor
 Fix For: 0.4.0

 Attachments: PIG-907-1.patch, PIG-907-2.patch


 HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV 
 when PIG-902 is not solved. So we can let the Pig pick the right version, do 
 the type cast. Otherwise, user have to do the explicit cast. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-893) support cast of chararray to other simple types

2009-08-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742144#action_12742144
 ] 

Alan Gates commented on PIG-893:


I'm reviewing this patch.

 support cast of chararray to other simple types
 ---

 Key: PIG-893
 URL: https://issues.apache.org/jira/browse/PIG-893
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Jeff Zhang
 Fix For: 0.4.0

 Attachments: Pig_893.Patch


 Pig should support casting of chararray to 
 integer,long,float,double,bytearray. If the conversion fails for reasons such 
 as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #158

2009-08-11 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/changes

Changes:

[gates] PIG-833: Added Zebra, new columnar storage mechanism for HDFS.

--
[...truncated 103108 lines...]
 [exec] [junit] 09/08/12 01:19:32 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/12 01:19:32 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/12 01:19:32 WARN dfs.DataNode: Unexpected error 
trying to delete block blk_-1535404250649000663_1004. BlockInfo not found in 
volumeMap.
 [exec] [junit] 09/08/12 01:19:32 INFO dfs.DataNode: Deleting block 
blk_4954179736192186775_1006 file dfs/data/data8/current/blk_4954179736192186775
 [exec] [junit] 09/08/12 01:19:32 WARN dfs.DataNode: 
java.io.IOException: Error in deleting blocks.
 [exec] [junit] at 
org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
 [exec] [junit] at java.lang.Thread.run(Thread.java:619)
 [exec] [junit] 
 [exec] [junit] 09/08/12 01:19:33 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/12 01:19:33 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. 
blk_2669403222345271811_1012
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_2669403222345271811_1012 src: /127.0.0.1:58050 dest: /127.0.0.1:40049
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_2669403222345271811_1012 src: /127.0.0.1:38276 dest: /127.0.0.1:54901
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_2669403222345271811_1012 src: /127.0.0.1:48397 dest: /127.0.0.1:34055
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block 
blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 0 
for block blk_2669403222345271811_1012 terminating
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:34055 is added to 
blk_2669403222345271811_1012 size 1476187
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block 
blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54901 is added to 
blk_2669403222345271811_1012 size 1476187
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 1 
for block blk_2669403222345271811_1012 terminating
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block 
blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40049 is added to 
blk_2669403222345271811_1012 size 1476187
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 2 
for block blk_2669403222345271811_1012 terminating
 [exec] [junit] 09/08/12 01:19:33 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/12 01:19:33 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.split. 
blk_-777871427035102840_1013
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_-777871427035102840_1013 src: /127.0.0.1:48398 dest: /127.0.0.1:34055
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_-777871427035102840_1013 src: /127.0.0.1:58054 dest: /127.0.0.1:40049
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block 
blk_-777871427035102840_1013 src: /127.0.0.1:38280 dest: /127.0.0.1:54901
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block 
blk_-777871427035102840_1013 of size 1837 from /127.0.0.1
 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 0 
for block blk_-777871427035102840_1013 terminating
 

[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742170#action_12742170
 ] 

Dmitriy V. Ryaboy commented on PIG-833:
---

Alan, this means Pig contrib/ is no longer compatible with Hadoop 18.
Which probably means that you need to either rolls this back or roll 660 in 
(and add the hadoop20.jar file to lib/ )
Otherwise the build is broken.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742201#action_12742201
 ] 

Jay Tang commented on PIG-833:
--

Zebra has a dependency on TFile that is available in Hadoop 20; that's why the 
compilation instruction is more complicated.  A new wiki at 
http://wiki.apache.org/pig/zebra will provide more information on Zebra.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler

2009-08-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742203#action_12742203
 ] 

Hadoop QA commented on PIG-890:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416267/sampler.patch
  against trunk revision 803312.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/console

This message is automatically generated.

 Create a sampler interface and improve the skewed join sampler
 --

 Key: PIG-890
 URL: https://issues.apache.org/jira/browse/PIG-890
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
 Attachments: sampler.patch


 We need a different sampler for order by and skewed join. We thus need a 
 better sampling interface. The design of the same is described here: 
 http://wiki.apache.org/pig/PigSampler

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.