date:20100907

[jira] Commented: (PIG-794) Use Avro serialization in Pig

2010-09-07 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906871#action_12906871
 ] 

Doug Cutting commented on PIG-794:
--

Jeff, what version of Avro are you using?

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
Assignee: Dmitriy V. Ryaboy
 Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, 
 AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, 
 jackson-asl-0.9.4.jar, PIG-794.patch


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1601) Make scalar work for secure hadoop

2010-09-07 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906874#action_12906874
 ] 

Thejas M Nair commented on PIG-1601:


+1 

 Make scalar work for secure hadoop
 --

 Key: PIG-1601
 URL: https://issues.apache.org/jira/browse/PIG-1601
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1601-1.patch


 Error message:
 open file
 'hdfs://gsbl90890.blue.ygrid.yahoo.com/tmp/temp851711738/tmp727366271'; error 
 =
 java.io.IOException: Delegation Token can be issued only with kerberos or web
 authentication at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:4975)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.getDelegationToken(NameNode.java:432)
 at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597) at
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1301) at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1297) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:396) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1295) at
 org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:66) at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:313)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:448)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:441)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide.getNext(Divide.java:72)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at
 org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:396) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1601) Make scalar work for secure hadoop

2010-09-07 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1601.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.8 branch.

 Make scalar work for secure hadoop
 --

 Key: PIG-1601
 URL: https://issues.apache.org/jira/browse/PIG-1601
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1601-1.patch


 Error message:
 open file
 'hdfs://gsbl90890.blue.ygrid.yahoo.com/tmp/temp851711738/tmp727366271'; error 
 =
 java.io.IOException: Delegation Token can be issued only with kerberos or web
 authentication at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:4975)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.getDelegationToken(NameNode.java:432)
 at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597) at
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1301) at
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1297) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:396) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1295) at
 org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:66) at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:313)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:448)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:441)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide.getNext(Divide.java:72)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
 at
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:638) at
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:314) at
 org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
 java.security.AccessController.doPrivileged(Native Method) at
 javax.security.auth.Subject.doAs(Subject.java:396) at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062)
 at org.apache.hadoop.mapred.Child.main(Child.java:211) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders

2010-09-07 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thejas M Nair updated PIG-1595:
---

Attachment: PIG-1595.2.patch

With changes in PIG-1595.1.patch, the column name gets propagated in the
schema , so I have updated the test case to use different column names in the
relation used as scalar so that it does not conflict with other column being
projected.

All testScalarAlias unit tests pass.

test-patch results -
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 6 new or
modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning
messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number
of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs
warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the
total number of release audit warnings.

casting relation to scalar- problem with handling of data from non PigStorage
loaders
-

Key: PIG-1595
URL: https://issues.apache.org/jira/browse/PIG-1595
Project: Pig
Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1595.1.patch, PIG-1595.2.patch

If load functions that don't follow the same bytearray format as PigStorage
for other supported datatypes, or those that don't implement the LoadCaster
interface are used in 'casting relation to scalar' (PIG-1434), it can cause
the query to fail or create incorrect results.
The root cause of the problem is that there is a real dependency between the
ReadScalars udf that returns the scalar value and the LogicalOperator that
acts as its input. But the logicalplan does not capture this dependency. So
in SchemaResetter visitor used by the optimizer, the order in which schema is
reset and evaluated does not take this into consideration. If the schema of
the input LogicalOperator does not get evaluated before the ReadScalar udf,
the resutltype of ReadScalar udf becomes bytearray. POUserFunc will convert
the input to bytearray using ' new DataByteArray(inp.toString().getBytes())'.
But this bytearray encoding of other supported types might not be same for
the LoadFunction associated with the column, and that can result in problems.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders

2010-09-07 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906932#action_12906932
]

Daniel Dai commented on PIG-1595:
-

+1 for the test failure fix.

casting relation to scalar- problem with handling of data from non PigStorage
loaders
-

Key: PIG-1595
URL: https://issues.apache.org/jira/browse/PIG-1595
Project: Pig
Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1595.1.patch, PIG-1595.2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders

2010-09-07 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906952#action_12906952
]

Thejas M Nair commented on PIG-1595:

PIG-1595.2.patch committed to trunk and 0.8 branch.

casting relation to scalar- problem with handling of data from non PigStorage
loaders
-

Key: PIG-1595
URL: https://issues.apache.org/jira/browse/PIG-1595
Project: Pig
Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1595.1.patch, PIG-1595.2.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Does Pig Re-Use FileInputLoadFuncs Objects?

2010-09-07 Thread Alan Gates

I'm not 100% sure I understand the question.  Are you asking if it re- 
uses instances of a given load or store function?  It should not.


Alan.

On Aug 31, 2010, at 7:28 PM, Russell Jurney wrote:

Pardon the cross-post: Does Pig ever re-use FileInputLoadFunc  
objects?  We

suspect state is being retained between different stores, but we don't
actually know this.  Figured I'd ask to verify the hunch.

Our load func for our in-house format works fine with Pig scripts
normally... but I have a pig script that looks like this:

LOAD thing1
SPLIT thing1 INTO thing2, thing3
STORE thing2 INTO thing2
STORE thing3 INTO thing3

LOAD thing4
SPLIT thing4 INTO thing5, thing6
STORE thing5 INTO thing5
STORE thing6 INTO thing6


And it works via PigStorage, but not via our FileInputLoadFunc.

Russ

[jira] Commented: (PIG-794) Use Avro serialization in Pig

2010-09-07 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907049#action_12907049
 ] 

Jeff Zhang commented on PIG-794:


Doug, I am using avro trunk revision 988779

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
Assignee: Dmitriy V. Ryaboy
 Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, 
 AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, 
 jackson-asl-0.9.4.jar, PIG-794.patch


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-09-07 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1178:

Attachment: PIG-1178-11.patch

PIG-1178-11.patch change the layout of explain, error code and comments, etc.
No real functional changes.

test-patch result:
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 11 new or
modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning
messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number
of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs
warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the
total number of release audit warnings.

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

Attachments: expressions-2.patch, expressions.patch, lp.patch,
lp.patch, PIG-1178-10.patch, PIG-1178-11.patch, PIG-1178-4.patch,
PIG-1178-5.patch, PIG-1178-6.patch, PIG-1178-7.patch, PIG-1178-8.patch,
PIG-1178-9.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch,
pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch,
pig_1178_3.patch

The current implementation of the logical plan and the logical optimizer in
Pig has proven to not be easily extensible. Developer feedback has indicated
that adding new rules to the optimizer is quite burdensome. In addition, the
logical plan has been an area of numerous bugs, many of which have been
difficult to fix. Developers also feel that the logical plan is difficult to
understand and maintain. The root cause for these issues is that a number of
design decisions that were made as part of the 0.2 rewrite of the front end
have now proven to be sub-optimal. The heart of this proposal is to revisit a
number of those proposals and rebuild the logical plan with a simpler design
that will make it much easier to maintain the logical plan as well as extend
the logical optimizer.
See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full
details.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-09-07 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12907061#action_12907061
]

Daniel Dai commented on PIG-1178:
-

PIG-1178-11.patch committed to both trunk and 0.8 branch.

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-09-07 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1178:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-10.patch, PIG-1178-11.patch, PIG-1178-4.patch, 
 PIG-1178-5.patch, PIG-1178-6.patch, PIG-1178-7.patch, PIG-1178-8.patch, 
 PIG-1178-9.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, 
 pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, 
 pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-794) Use Avro serialization in Pig

[jira] Commented: (PIG-1601) Make scalar work for secure hadoop

[jira] Resolved: (PIG-1601) Make scalar work for secure hadoop

[jira] Updated: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders

[jira] Commented: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders

[jira] Commented: (PIG-1595) casting relation to scalar- problem with handling of data from non PigStorage loaders

Re: Does Pig Re-Use FileInputLoadFuncs Objects?

[jira] Commented: (PIG-794) Use Avro serialization in Pig

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

11 matches

Site Navigation

Mail list logo

Footer information