[jira] Commented: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml

2010-01-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800285#action_12800285
 ] 

Daniel Dai commented on PIG-1186:
-

I didn't include unit test because it is very hard to write a unit test for 
this. I tested it manually and it works.

 Pig do not take values in pig-cluster-hadoop-site.xml
 ---

 Key: PIG-1186
 URL: https://issues.apache.org/jira/browse/PIG-1186
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1186-1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-14 Thread Ying He (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800289#action_12800289
 ] 

Ying He commented on PIG-1178:
--

+1

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions.patch, lp.patch, PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1187) UTF-8 (international code) breaks with loader when load with schema is specified

2010-01-14 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800315#action_12800315
 ] 

Viraj Bhat commented on PIG-1187:
-

Hi Jeff,
 This is specific to the data we are using and it looks like parser failed when 
it is trying to interpret some characters. As such we have tested this with 
Chinese characters and it works.
Viraj

 UTF-8 (international code) breaks with loader when load with schema is 
 specified
 

 Key: PIG-1187
 URL: https://issues.apache.org/jira/browse/PIG-1187
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.6.0


 I have a set of Pig statements which dump an international dataset.
 {code}
 INPUT_OBJECT = load 'internationalcode';
 describe INPUT_OBJECT;
 dump INPUT_OBJECT;
 {code}
 Sample output
 (756a6196-ebcd-4789-ad2f-175e5df65d55,{(labelAaÂâÀ),(labelあいうえお1),(labelஜார்க2),(labeladfadf)})
 It works and dumps results but when I use a schema for loading it fails.
 {code}
 INPUT_OBJECT = load 'internationalcode' AS (object_id:chararray, labels: bag 
 {T: tuple(label:chararray)});
 describe INPUT_OBJECT;
 {code}
 The error message is as follows:2010-01-14 02:23:27,320 FATAL 
 org.apache.hadoop.mapred.Child: Error running child : 
 org.apache.pig.data.parser.TokenMgrError: Error: Bailing out of infinite loop 
 caused by repeated empty string matches at line 1, column 21.
   at 
 org.apache.pig.data.parser.TextDataParserTokenManager.TokenLexicalActions(TextDataParserTokenManager.java:620)
   at 
 org.apache.pig.data.parser.TextDataParserTokenManager.getNextToken(TextDataParserTokenManager.java:569)
   at 
 org.apache.pig.data.parser.TextDataParser.jj_ntk(TextDataParser.java:651)
   at 
 org.apache.pig.data.parser.TextDataParser.Tuple(TextDataParser.java:152)
   at 
 org.apache.pig.data.parser.TextDataParser.Bag(TextDataParser.java:100)
   at 
 org.apache.pig.data.parser.TextDataParser.Datum(TextDataParser.java:382)
   at 
 org.apache.pig.data.parser.TextDataParser.Parse(TextDataParser.java:42)
   at 
 org.apache.pig.builtin.Utf8StorageConverter.parseFromBytes(Utf8StorageConverter.java:68)
   at 
 org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(Utf8StorageConverter.java:76)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:845)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:250)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1178:


Attachment: expressions-2.patch

New patch that addresses the unit test failure and javadoc warnings.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1178:


Status: Open  (was: Patch Available)

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1178:


Status: Patch Available  (was: Open)

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1189) StoreFunc UDF should ship to the backend automatically without register

2010-01-14 Thread Daniel Dai (JIRA)
StoreFunc UDF should ship to the backend automatically without register
-

 Key: PIG-1189
 URL: https://issues.apache.org/jira/browse/PIG-1189
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0


Pig should ship store UDF to backend even if user do not use register. The 
prerequisite is that UDF should be in classpath on frontend. We make that work 
for load UDF in (PIG-881|https://issues.apache.org/jira/browse/PIG-881), we 
shall do the same thing for store UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-14 Thread Ying He (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800353#action_12800353
 ] 

Ying He commented on PIG-1178:
--

to answer Daniel's questions:

. In Rule.match, is PatternMatchOperatorPlan only contains leave nodes but not 
edge information? If so, instead of saying A list of all matched sub-plans, 
can we put more details in the comments?

The returned lists are plans. You can call getPredecessors() or getSuccessors() 
on any node in the plan. The implementation doesn't keep edge information, it 
calls the base plan for this information and return the operators that are in 
this sub-plan. So looking from outside, it is a plan, it's just read-only, and 
method to update the plan would throw an exception.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands

2010-01-14 Thread Thejas M Nair (JIRA)
Handling of quoted strings in pig-latin/grunt commands
--

 Key: PIG-1190
 URL: https://issues.apache.org/jira/browse/PIG-1190
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair


There is some inconsistency in the way quoted strings are used/handled in 
pig-latin .
In load/store and define-ship commands, files are specified in quoted strings , 
and the file name is the content within the quotes.  But in case of register, 
set, and file system commands , if string is specified in quotes, the quotes 
are also included as part of the string. This is not only inconsistent , it is 
also unintuitive. 
This is also inconsistent with the way hdfs commandline (or bash shell) 
interpret file names.

For example, currently with the command - 
set job.name 'job123'
The job name set set to 'job123' (including the quotes) not job123 .

This needs to be fixed, and above command should be considered equivalent to - 
set job.name job123. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1190) Handling of quoted strings in pig-latin/grunt commands

2010-01-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800359#action_12800359
 ] 

Thejas M Nair commented on PIG-1190:


This breaks backward compatibility, but I don't think thing the use of file 
names or job names that actually have quotes is likely to be common . For the 
long run, I think this is the right thing to do.




 Handling of quoted strings in pig-latin/grunt commands
 --

 Key: PIG-1190
 URL: https://issues.apache.org/jira/browse/PIG-1190
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair

 There is some inconsistency in the way quoted strings are used/handled in 
 pig-latin .
 In load/store and define-ship commands, files are specified in quoted strings 
 , and the file name is the content within the quotes.  But in case of 
 register, set, and file system commands , if string is specified in quotes, 
 the quotes are also included as part of the string. This is not only 
 inconsistent , it is also unintuitive. 
 This is also inconsistent with the way hdfs commandline (or bash shell) 
 interpret file names.
 For example, currently with the command - 
 set job.name 'job123'
 The job name set set to 'job123' (including the quotes) not job123 .
 This needs to be fixed, and above command should be considered equivalent to 
 - set job.name job123. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800413#action_12800413
 ] 

Daniel Dai commented on PIG-1090:
-

PIG-1090-12.patch committed.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-2.patch, 
 PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, 
 PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-14 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: PIG-1090-13.patch

The main updates of the latest patch 13 is the following:

* Remove the these files: 

{code}
D  src/org/apache/pig/experimental/LoadMetadata.java
D  src/org/apache/pig/experimental/ResourceStatistics.java
D  src/org/apache/pig/experimental/ResourceSchema.java
D  src/org/apache/pig/experimental/JsonMetadata.java
D  src/org/apache/pig/experimental/StoreMetadata.java
{code}

* Move _JsonMetadata.java_ to the package _org.apache.pig.piggybank.storage_

* Move _StoreMetadata.java_ to the package _org.apache.pig_

* Modify _PigStorageSchema_ class to use _PigOutputCommitter_ to store the 
metadata with the output file (PIG-760).

Dmitriy, 

Can you review PIG-760 related changes?

Thanks.


 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, 
 PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1185) Data bags do not close spill files after using iterator to read tuples

2010-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1185:


  Resolution: Fixed
Assignee: Ying He
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.6 branch. No unit test included because it 
is a fix to existing features. It is very hard to make a unit test for it.

 Data bags do not close spill files after using iterator to read tuples
 --

 Key: PIG-1185
 URL: https://issues.apache.org/jira/browse/PIG-1185
 Project: Pig
  Issue Type: Bug
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.6.0

 Attachments: PIG_1185.patch


 spill files are not closed after reading the tuples from iterator. When large 
 number of spill files exists, this can exceed specified max number of open 
 files on the system and therefore, cause application failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800434#action_12800434
 ] 

Hadoop QA commented on PIG-1178:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430285/expressions-2.patch
  against trunk revision 898497.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/175/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/175/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/175/console

This message is automatically generated.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml

2010-01-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800438#action_12800438
 ] 

Olga Natkovich commented on PIG-1186:
-

+1

 Pig do not take values in pig-cluster-hadoop-site.xml
 ---

 Key: PIG-1186
 URL: https://issues.apache.org/jira/browse/PIG-1186
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1186-1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1186) Pig do not take values in pig-cluster-hadoop-site.xml

2010-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1186:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.6 branch.

 Pig do not take values in pig-cluster-hadoop-site.xml
 ---

 Key: PIG-1186
 URL: https://issues.apache.org/jira/browse/PIG-1186
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1186-1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-14 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: PIG-1090-13.patch

Sync patch-13 with patch-12.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, 
 PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-14 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: (was: PIG-1090-13.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, PIG-1090-13.patch, 
 PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, 
 PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, 
 PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-14 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: (was: PIG-1090-13.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-14 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: (was: PIG-1090-13.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-14 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: (was: PIG-1090-13.patch)

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-01-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800479#action_12800479
 ] 

Alan Gates commented on PIG-1178:
-

I've checked in the expressions-2.patch.  I'll flesh out LogicalSchema in a 
separate patch.



 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-14 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800500#action_12800500
 ] 

Dmitriy V. Ryaboy commented on PIG-1090:


Richard,
I'll check it out, thanks.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, 
 PIG-1090-13.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, 
 PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, 
 PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-50) query optimization for Pig

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-50.
---

   Resolution: Fixed
Fix Version/s: 0.3.0

A rudimentary optimizer was added by 0.3, with ongoing work being done on it 
(see PIG-1178).

 query optimization for Pig
 --

 Key: PIG-50
 URL: https://issues.apache.org/jira/browse/PIG-50
 Project: Pig
  Issue Type: Wish
  Components: impl
Reporter: Christopher Olston
 Fix For: 0.3.0


 add relational query optimization techniques, or similar, to Pig
 discussion so far:
 ** Amir Youssefi:
 Comparing two pig scripts of join+filter  and filter+join I see that pig has
 an optimization opportunity of first doing filter by constraints then do the
 actual join. Do we have a JIRA open for this (or other optimization
 scenarios)? 
 In my case, the first one resulted in OutOfMemory exception but the second
 one runs just fine. 
 ** Chris Olston:
 Yup. It would be great to sprinkle a little relational query optimization 
 technology onto Pig.
 Given that query optimization is a double-edged sword, we might want to 
 consider some guidelines of the form:
 1. Optimizations should always be easy to override by the user. (Sometimes 
 the system is smarter than the user, but other times the reverse is true, and 
 that can be incredibly frustrating.)
 2. Only safe optimizations should be performed, where a safe optimization 
 is one that with 95% probability doesn't make the program slower. (An example 
 is pushing filters before joins, given that the filter is known to be cheap; 
 if the filter has a user-defined function it is not guaranteed to be cheap.) 
 Or perhaps there is a knob that controls worst-case versus expected-case 
 minimization.
 We're at a severe disadvantage relative to relational query engines, because 
 at the moment we have zero metadata. We don't even know the schema of our 
 data sets, much less the distributions of data values (which in turn govern 
 intermediate data sizes between operators). We have to think about how to 
 approach this that is compatible with the Pig philosophy of having metadata 
 always be optional. It could be as simple as (fine, if the user doesn't want 
 to register his data with Pig, then Pig won't be able to optimize programs 
 over that data very well), or as sophisticated as on-line sampling and/or 
 on-line operator reordering.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-64) Formatter for Hadoop Job Config file

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-64.
---

Resolution: Incomplete

This patch is way out of date.  It also isn't clear to me that PIg wants to get 
into the business of interpreting JobConf since we don't control it.

 Formatter for Hadoop Job Config file
 

 Key: PIG-64
 URL: https://issues.apache.org/jira/browse/PIG-64
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Benjamin Reed
Priority: Minor
 Attachments: printer.patch


 We serialize and encode a number of different Pig data structures that 
 describe a part of a Pig job to run in Hadoop. Because of the encoding you 
 cannot see what Pig was doing in a given Hadoop job using just the job XML 
 config file. We need a simple program to make the Hadoop job structures human 
 readable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-21) Show more details about the current execution context

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-21?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-21.
---

Resolution: Won't Fix

It looks like this patch got dropped without being finished.

 Show more details about the current execution context
 -

 Key: PIG-21
 URL: https://issues.apache.org/jira/browse/PIG-21
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Affects Versions: 0.1.0
Reporter: Andrzej Bialecki 
Priority: Minor
 Attachments: context.patch


 After a long interactive session with grunt I lost track of what kind of 
 queries I defined, and then re-defined. It would be nice to have the ability 
 to show all defined aliases, and other context variables, such as the 
 filesystem, jobTracker, user jars and Configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-76) Unit tests for Grunt

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-76.
---

Resolution: Fixed

Tests for grunt were added some time ago.

 Unit tests for Grunt
 

 Key: PIG-76
 URL: https://issues.apache.org/jira/browse/PIG-76
 Project: Pig
  Issue Type: Bug
Reporter: Antonio Magnaghi

 Currently there are no units tests in place for Grunt. However Grunt is 
 extensively used as part of the end-to-end tests. If some changes break 
 Grunt, this will become evident only later on in the development process 
 during E2E testing.
 Talked to Alan and Olga, probably the best way to address this is to put in 
 place unit tests that integrate with the test harness used for regression.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-79) Switch grunt shell to use hadoop FSShell for DFS commands

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-79.
---

Resolution: Duplicate

 Switch grunt shell to use hadoop FSShell for DFS commands
 -

 Key: PIG-79
 URL: https://issues.apache.org/jira/browse/PIG-79
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich

 This will provide us command semantics consistent with hadoop including 
 allowing pig remove command to use trash.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-82) Loose floating point precision

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-82?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-82.
---

Resolution: Won't Fix

Loss of precision is a known issue with floating point numbers.  The correct 
solution here is to introduce a fixed point type, similar to SQL's decimal.

 Loose floating point precision
 --

 Key: PIG-82
 URL: https://issues.apache.org/jira/browse/PIG-82
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.1.0
Reporter: Daeho Baek

 Pig looses floating point precision during conversion between binary and 
 string conversion.
 Here is an example code.
 words = LOAD '/user/daeho/words.txt' as (word);
 numWords  = FOREACH (GROUP words ALL) GENERATE COUNT($1);
 weight = FOREACH numWords GENERATE 1.0 / $0;
 wordsWithWeight = CROSS words, weight;
 sumWeight = FOREACH (GROUP wordsWithWeight ALL) GENERATE SUM($1.$1);
 dump sumWeight;
 sumWeight is not 1 even though words.txt has 118 lines.
 Can we store floating point as binary format?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-119) test suite improvements

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-119.


Resolution: Fixed

Hudson ci build is done and there are tests for local mode.  We aren't planning 
on moving the unit tests into the various different packages at the moment.

 test suite improvements
 ---

 Key: PIG-119
 URL: https://issues.apache.org/jira/browse/PIG-119
 Project: Pig
  Issue Type: Improvement
Reporter: Stefan Groschupf
Priority: Critical

 From my point of view a test suite is very important for a open source 
 project. As better and easier to use it is, as more people can easy 
 contribute and fix bugs. 
 With this in mind I see some space for improvement in the test suite for pig. 
 Here my suggestions, I would love to work on that in case we all agree on the 
 points.
 Phase 1:
 + it should be possible to switch a test mode that defines if pig runs in 
 local mode, mini cluster or big cluster.
 ++ ant test -Dtest.mode=local or -Dtest.mode=mapreduce or 
 -Dtest.mode=mapreduce -Dcluster=myJobTracker
 ++ default should be local
 Phase 2:
 + setup a hudson ci build, run minicluster once a day, run local mode after 
 each checkin.
 Phase 3:
 cleanup the test package, general standard is that each test should be in the 
 same package as the class that is tested.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-117) commons logging and log4j

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-117.


Resolution: Won't Fix

We're not pulling log4j out anytime soon.

 commons logging and log4j
 -

 Key: PIG-117
 URL: https://issues.apache.org/jira/browse/PIG-117
 Project: Pig
  Issue Type: Improvement
Reporter: Stefan Groschupf

 On the one hand side Pig uses commons logging - what makes sense. On the 
 other hand side the Pig Main class configure Log4j in the code. This 
 introduce a log4j must have dependency. 
 I suggest to only use a log4j configuration file to configure log4j and 
 remove the log4j configuration in the code. 
 Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-135) Ensure no temporary files are created in the top-level source directory during the build/test process

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-135.


Resolution: Fixed

Everything but src-gen now goes under the build directory.  We aren't planning 
on moving src-gen there.

 Ensure no temporary files are created in the top-level source directory 
 during the build/test process
 -

 Key: PIG-135
 URL: https://issues.apache.org/jira/browse/PIG-135
 Project: Pig
  Issue Type: Improvement
Reporter: Arun C Murthy

 Let's assume SRC_TOP is the top-level src directory.
 Currently the build process creates a *src-gen* directory in SRC_TOP and the 
 junit tests create *dfs* and *test* directories in SRC_TOP. This means that 
 the 'ant clean' task now has to cleanup all of them.
 Interestingly, 'ant clean' doesn't remove the 'dfs' directory at all... a 
 related bug.
 It would be nice to create a standalone _build_ directory in the top-level 
 directory and then use that as the parent of _all_ generated files (source 
 and non-source). This would mean 'ant clean' would just need to delete the 
 build directory. It plays well when there are multiple sub-projects developed 
 on top of Pig (e.g. contrib etc.) too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-145) LocalFile ignores active container and HDataStorage can't copy to other DataStrorage

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-145.


Resolution: Won't Fix

As of Pig 0.6 true local mode (ie Pig executing the code rather than through 
Map Reduce) has been removed.

 LocalFile ignores active container and HDataStorage can't copy to other 
 DataStrorage
 

 Key: PIG-145
 URL: https://issues.apache.org/jira/browse/PIG-145
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Charlie Groves
 Attachments: PIG-145-DataStorage_Bugs.patch


 As part of starting to rewrite the DataStorage APIs, I wrote some unit tests 
 for the existing DataStorage implementations to make sure I wasn't breaking 
 anything.  In testing the open code, I found that LocalFile doesn't respect 
 the active container you set you LocalDataStorage, so if you open a relative 
 file, it's relative to wherever you're running the code.  Similarly, while 
 testing the copy operations, I found that HFile doesn't allow copying to 
 anything other than other HFiles and that HDirectory's copy operation was 
 never used because it had the wrong signature.
 The attached patch fixes these issues and adds tests for much of the 
 DataStorage API for both of the existing backends.  There are no tests for 
 the sopen code as I'm planning on changing that significantly in rewriting 
 these.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-147) Pig Jira Administrator: Please remove the Patch Available check box

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-147.


Resolution: Fixed

 Pig Jira Administrator: Please remove the Patch Available check box
 ---

 Key: PIG-147
 URL: https://issues.apache.org/jira/browse/PIG-147
 Project: Pig
  Issue Type: Bug
Reporter: Xu Zhang
Priority: Minor

 We now have Patch Available as a status of a JIRA Pig bug, so the Patch 
 Available checkbox needs to be removed from the Find pane and the Edit page 
 of the Pig project.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-163) Improve parsing for UDFs in QueryParser

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-163.


Resolution: Fixed

Fixed a long time ago.

 Improve parsing for UDFs in QueryParser
 ---

 Key: PIG-163
 URL: https://issues.apache.org/jira/browse/PIG-163
 Project: Pig
  Issue Type: Bug
Reporter: Arun C Murthy

 Parsing of UDFs in QueryParser (used in LOAD/GROUP) could be more strict, 
 currently it just assumes it is a list of quoted-strings, so for e.g. it 
 doesn't handle UDFs which take other UDFs as arguments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



reading/writing HBase in Pig

2010-01-14 Thread Michael Dalton
Hi all,

I was looking at the current Pig code in SVN, and it seems like HBase is
supported for loading, but not for storing. If this is the case, I'd like to
add support for writing to HBase to Pig. Is there anyone else working on
this, and if not is this something that you'd like contributed? Based on a
cursory evaluation of the StoreFunc interface, it looks like the APIs there
are pretty file-centric and may need to be modified to accomodate HBase's
table-based design. For example, you aren't going to be serializing your
output to an OutputStream object in all likelihood.

I haven't contributed to Pig before, and I wanted to see if this is
something that would be beneficial to the rest of the Pig community, and if
so what next steps I should take (like starting a JIRA) to get the ball
rolling. Thanks

Best regards,

Mike


[jira] Resolved: (PIG-175) Reading compressed files in local mode + MiniMRCluster

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-175.


Resolution: Won't Fix

Pig local mode has been dropped in 0.6 in favor of Hadoop's LocalJobRunner.  
I'm not worried about being unable to mix compressed and uncompressed files in 
MiniMR mode.

 Reading compressed files in local mode + MiniMRCluster
 --

 Key: PIG-175
 URL: https://issues.apache.org/jira/browse/PIG-175
 Project: Pig
  Issue Type: Bug
Reporter: Craig Macdonald
 Attachments: testCompressed.sh


 I have written a small test script that tests if three simple compressed and 
 uncompressed files can be loaded successfully. Essentially, it writes a file, 
 compresses it using gzip and bzip2, and see if Pig can load it. I use both 
 local execution mode and miniMR cluster.
 Here are my results:
 MiniMRCluster
  * uncompressed: OK
  * gzip: OK
  * bzip2: OK
  * All three at once: not OK
 Local Execution Mode
  * uncompressed: OK
  * gzip: not OK (garbled output)
  * bzip2: not OK ( garbled output)
  * All three at once: not OK (expected)
 I'm not sure what the problem is with the miniMRcluster - there is a NPE in 
 PigSplit.getLocations(). I suspect that getFileCacheHints() is returning 
 null, which ususally indicates a non-existant file. 
 However, for the local execution mode, I'm fairly confident that this mode 
 has no support for compressed files.
 Craig
 {noformat}
 ==
 Bashs good friend: cat
 ==
 Normal
 A
 B
 C
 bz2
 A
 B
 C
 gzip
 A
 B
 C
 ==
 MiniMRCluster
 ==
 test.all.pig
 2008-03-29 12:07:22,103 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: file:///
 2008-03-29 12:07:22,241 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics 
 - Initializing JVM Metrics with processName=JobTracker, sessionId=
 2008-03-29 12:07:22,555 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - - MapReduce 
 Job -
 2008-03-29 12:07:22,556 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: 
 [/users/grad/craigm/src/pig/FROMApache/trunk4/trunk/test.normal:org.apache.pig.builtin.PigStorage()]
 2008-03-29 12:07:22,556 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]]
 2008-03-29 12:07:22,556 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
 2008-03-29 12:07:22,556 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
 2008-03-29 12:07:22,556 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
 2008-03-29 12:07:22,556 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output: 
 /tmp/temp-1403805719/tmp1733057091:org.apache.pig.builtin.BinStorage
 2008-03-29 12:07:22,556 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
 2008-03-29 12:07:22,556 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map parallelism: 
 -1
 2008-03-29 12:07:22,557 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce 
 parallelism: -1
 2008-03-29 12:07:23,427 [Thread-0] INFO  org.apache.hadoop.mapred.MapTask - 
 numReduceTasks: 1
 2008-03-29 12:07:23,544 [Thread-0] INFO  
 org.apache.hadoop.mapred.LocalJobRunner -
 2008-03-29 12:07:23,545 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
 - Task 'map_' done.
 2008-03-29 12:07:23,581 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
 - Saved output of task 'map_' to file:/tmp/temp-1403805719/tmp1733057091
 2008-03-29 12:07:23,625 [Thread-0] INFO  
 org.apache.hadoop.mapred.LocalJobRunner - reduce  reduce
 2008-03-29 12:07:23,626 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
 - Task 'reduce_cibps7' done.
 2008-03-29 12:07:23,630 [Thread-0] INFO  org.apache.hadoop.mapred.TaskRunner 
 - Saved output of task 'reduce_cibps7' to 
 file:/tmp/temp-1403805719/tmp1733057091
 2008-03-29 12:07:24,383 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher 
 - Pig progress = 100%
 (A)
 (B)
 (C)
 2008-03-29 12:07:24,415 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - - MapReduce 
 Job -
 2008-03-29 12:07:24,415 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input: 
 [/user/craigm/test.gz:org.apache.pig.builtin.PigStorage()]
 2008-03-29 12:07:24,416 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map: [[*]]
 2008-03-29 12:07:24,416 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.POMapreduce - 

[jira] Resolved: (PIG-208) Keeping files internalized

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-208.


Resolution: Incomplete

I don't understand what this means.

 Keeping files internalized
 --

 Key: PIG-208
 URL: https://issues.apache.org/jira/browse/PIG-208
 Project: Pig
  Issue Type: New Feature
  Components: data
Reporter: John DeTreville

 Pig files are kept in externalized form between Pig programs, but (I believe) 
 are held in internalized form while being used. It is expensive to 
 internalize externalized files at the beginning of each program, and to 
 externalize internalized files at the end of each program. Pig needs a way to 
 keep its files internalized across programs. This will require a way to name 
 and manage internalized files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-209) Indexes for accelerating joins

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-209.


Resolution: Won't Fix

At this point Pig is relying on storage formats such as Zebra to do indexing.  
We have no near term plans to provide indexing inside Pig itself.

 Indexes for accelerating joins
 --

 Key: PIG-209
 URL: https://issues.apache.org/jira/browse/PIG-209
 Project: Pig
  Issue Type: New Feature
  Components: data
Reporter: John DeTreville

 Computing the inner join of a very large table (i.e., bag or mapping) with a 
 smaller table can take time proportional to the size of the very large table. 
 This time required can be greatly reduced if the very large table is indexed, 
 taking time proportional to the size of the smaller table. It should be 
 possible for clients to index tables for use by future joins.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-210) Column store

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-210.


Resolution: Duplicate

 Column store
 

 Key: PIG-210
 URL: https://issues.apache.org/jira/browse/PIG-210
 Project: Pig
  Issue Type: New Feature
  Components: data
Reporter: John DeTreville

 I believe that Pig stores its tables in row order, which is less efficient in 
 space and time than column order in a data-mining system. Column stores can 
 be more highly compressed, and can be read and written faster. It should be 
 possible for clients to store their tables in column order.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-221) Release updated builds on a regular basis

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-221.


Resolution: Won't Fix

We now have regular releases and a continuous integration process.  We don't 
have, or plan to have, nightly builds.

 Release updated builds on a regular basis
 -

 Key: PIG-221
 URL: https://issues.apache.org/jira/browse/PIG-221
 Project: Pig
  Issue Type: Task
Reporter: Amir Youssefi

 Release updated builds on a regular basis. 
 For the starter we can use Hudson to release nightly builds.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-241) Sharding and joins

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-241.


Resolution: Won't Fix

We have chosen a different approach to this.  Our merge join does take 
advantage of sort order, but does not require that data be partitioned in the 
same way in order to do the join, as the this suggested sharding approach does.

 Sharding and joins
 --

 Key: PIG-241
 URL: https://issues.apache.org/jira/browse/PIG-241
 Project: Pig
  Issue Type: New Feature
  Components: data
Reporter: John DeTreville

 Many large distributed systems for storage and computing over tables divide 
 these tables into smaller _shards,_ such that all rows with the same 
 (primary) key will appear in the same shard. If two tables are consistently 
 sharded, then they can be joined shard-by-shard. If corresponding shards are 
 stored on the same hosts (or racks), then joins can be performed locally on 
 those hosts without copying the rows of the tables over the network; this can 
 produce significant speedups.
 Pig does not currently provide application-controlled sharding and the 
 associated shard placement and computation placement. The performance of 
 joins therefore suffers in many scenarios; rows are passed over the network 
 multiple times when performing a join. If Pig (and Hadoop) could provide the 
 ability for the application to shard tables consistently, according to an 
 application-controlled policy, joins could be completely local operations and 
 could in many cases perform much better.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-247) Accept globbing when ExecType.LOCAL

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-247.


Resolution: Won't Fix

In Pig 0.6 Pig's local mode has been replaced with Hadoop's LocalJobRunner.

 Accept globbing when ExecType.LOCAL
 ---

 Key: PIG-247
 URL: https://issues.apache.org/jira/browse/PIG-247
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Iván de Prado
Priority: Minor

 Globs are supported when ExecType is MAPREDUCE (Hadoop), but not when 
 ExecType is LOCAL. That is inconsistent. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-281) Support # for comment besides --

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-281.


Resolution: Won't Fix

# is the map dereference operator in Pig Latin and thus cannot be the comment 
operator too.

 Support # for comment besides --
 

 Key: PIG-281
 URL: https://issues.apache.org/jira/browse/PIG-281
 Project: Pig
  Issue Type: Improvement
Reporter: Amir Youssefi
Priority: Trivial



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-265) Make all functions in pig case insesitive.

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-265.


Resolution: Won't Fix

Since we map directly from UDF name to java (package and) class, this would be 
difficult.  

 Make all functions in pig case insesitive.
 --

 Key: PIG-265
 URL: https://issues.apache.org/jira/browse/PIG-265
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich

 I should be able to say COUNT, Count, or count in my script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-371) Show line number in grunt

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-371.


Resolution: Won't Fix

 Show line number in grunt
 -

 Key: PIG-371
 URL: https://issues.apache.org/jira/browse/PIG-371
 Project: Pig
  Issue Type: Improvement
Reporter: Amir Youssefi
Priority: Trivial

 Now that PIG-270 is in. It will be nice to have line number in grunt prompt. 
 Something like this: 
 10 grunt 
 grunt (10)  
 grunt:10
 10: grunt 
 etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-417) Local Mode is broken

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-417.


Resolution: Won't Fix

Local mode has been replaced by Hadoop's LocalJobRunner in 0.6.

 Local Mode is broken
 

 Key: PIG-417
 URL: https://issues.apache.org/jira/browse/PIG-417
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Shravan Matthur Narayanamurthy
Priority: Minor

 When we use pig in local mode and also have some config files that point to 
 cluster (in the form of hadoop-site.xml) in the classpath, the local mode 
 errs out saying it can't find the input file. This is because, when the local 
 execution engine is being created, a new Configuration object is being 
 created which takes properties from hadoop-site.xml while initializing. 
 Because of this from then on it tries to connect to the settings in the 
 hadoop-site.xml and fails to find the local files.
 However, as we are in local mode we want this new Configuration to contain 
 only properties from our pigContext. Currently, the configuration object 
 doesn't support such a thing. We would actually want to initialize the 
 Configuration with properties in hadoop-default.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-384) regression: execution plan does not show up in the job's output

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-384.


Resolution: Not A Problem

This is by design.  The execution plan can be shown by adding -v to the pig 
command line.

 regression: execution plan does not show up in the job's output
 ---

 Key: PIG-384
 URL: https://issues.apache.org/jira/browse/PIG-384
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Priority: Minor

 The code in trunk shows execution plan as part of job's output. this is 
 missing from types branch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-386) Pig does not do type checking on a per statement basis

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-386.


Resolution: Won't Fix

 Pig does not do type checking on a per statement basis
 --

 Key: PIG-386
 URL: https://issues.apache.org/jira/browse/PIG-386
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Shravan Matthur Narayanamurthy
Priority: Minor

 Currently though Pig has a type checker it is not called with every query 
 registration. Instead, the system waits till there is a dump or store. I 
 think its not in line with the philosophy of catching errors early. Instead 
 of the typechecking happening in the execute method, it should happen in 
 registerQuery and the execute method should expect a type checked plan.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-458) Type branch integration with hadoop 18

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-458.


Resolution: Fixed

Done a long time ago.

 Type branch integration with hadoop 18
 --

 Key: PIG-458
 URL: https://issues.apache.org/jira/browse/PIG-458
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Attachments: hadoop18.jar, PIG-458.patch, un18.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-478) allowing custome partitioner between map and reduce

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-478.


Resolution: Duplicate

 allowing custome partitioner between map and reduce
 ---

 Key: PIG-478
 URL: https://issues.apache.org/jira/browse/PIG-478
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich

 the hope is for more even distribution. Don't have a specific use case here

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-491) evaluate function argument expressions before the arguments are constructed as bags of tuples (a la SQL)

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-491.


Resolution: Won't Fix

We're not going to change Pig Latin semantics at such a basic level at this 
point.

 evaluate function argument expressions before the arguments are constructed 
 as bags of tuples (a la SQL)
 

 Key: PIG-491
 URL: https://issues.apache.org/jira/browse/PIG-491
 Project: Pig
  Issue Type: New Feature
 Environment: pig interpreter
Reporter: Mike Potts

 The final section of:
   http://wiki.apache.org/pig/PigTypesFunctionalSpec
 proposes this exact feature.  The crucial excerpt is:
 The proposed solution is to change the semantics of pig, so that expression 
 evaluation on function arguments is done before the arguments are constructed 
 as bags of tuples, rather than afterwards. This means that the semantics 
 would change so that SUM(salary * bonus_multiplier) means that for each tuple 
 in grouped, the fields grouped.employee:salary and 
 grouped.employee:bonus_multiplier will be multiplied and the result formed 
 into tuples that are placed in a bag to be passed to the function SUM().
 This would make my pig scripts significantly shorter and easier to understand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-492) There should be a way for Loader to refer to the output of determineSchema() in the backend

2010-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-492.


Resolution: Fixed

PIG-1085 provides this functionality.

 There should be a way for Loader to refer to the output of determineSchema() 
 in the backend
 ---

 Key: PIG-492
 URL: https://issues.apache.org/jira/browse/PIG-492
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath

 Currently LoadFunc.determineSchema() is only called from LOLoad() at parse 
 time in the front end. If the loader.getNext() needs to know what the output 
 of determineSchema() was there is no way to get to it in the backend - there 
 should be some way to get to it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: reading/writing HBase in Pig

2010-01-14 Thread Dmitriy Ryaboy
Hi Mike,
It would be great to have a StoreFunc for HBase!
There is  a rewrite underway for the Load/Store stuff that will make
that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966
.  You may want to consider writing it for the load-store redesign
branch.  This is what's probably going to be in 0.7. The first step
would be to open a jira and look at the existing StoreFunc
implementations.

-D

On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton mwdal...@gmail.com wrote:
 Hi all,

 I was looking at the current Pig code in SVN, and it seems like HBase is
 supported for loading, but not for storing. If this is the case, I'd like to
 add support for writing to HBase to Pig. Is there anyone else working on
 this, and if not is this something that you'd like contributed? Based on a
 cursory evaluation of the StoreFunc interface, it looks like the APIs there
 are pretty file-centric and may need to be modified to accomodate HBase's
 table-based design. For example, you aren't going to be serializing your
 output to an OutputStream object in all likelihood.

 I haven't contributed to Pig before, and I wanted to see if this is
 something that would be beneficial to the rest of the Pig community, and if
 so what next steps I should take (like starting a JIRA) to get the ball
 rolling. Thanks

 Best regards,

 Mike



[jira] Created: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-14 Thread Ankur (JIRA)
POCast throws exception for certain sequences of LOAD, FILTER, FORACH
-

 Key: PIG-1191
 URL: https://issues.apache.org/jira/browse/PIG-1191
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur
Priority: Blocker


When using a custom load/store function, one that returns complex data (map of 
maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig script 
throws an exception of the form -
 
org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine how to convert the bytearray to 
actual-type
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
...
Looking through the code of POCast, apparently the operator was unable to find 
the right load function for doing the conversion and consequently bailed out 
with the exception failing the entire pig script.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1191:


Status: Patch Available  (was: Open)

 POCast throws exception for certain sequences of LOAD, FILTER, FORACH
 -

 Key: PIG-1191
 URL: https://issues.apache.org/jira/browse/PIG-1191
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur
Priority: Blocker
 Attachments: PIG-1191-1.patch


 When using a custom load/store function, one that returns complex data (map 
 of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
 script throws an exception of the form -
  
 org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
 bytearray from the UDF. Cannot determine how to convert the bytearray to 
 actual-type
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
 ...
 Looking through the code of POCast, apparently the operator was unable to 
 find the right load function for doing the conversion and consequently bailed 
 out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1191:


Attachment: PIG-1191-1.patch

Hi, Ankur,
Can you check if this patch works?

 POCast throws exception for certain sequences of LOAD, FILTER, FORACH
 -

 Key: PIG-1191
 URL: https://issues.apache.org/jira/browse/PIG-1191
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur
Priority: Blocker
 Attachments: PIG-1191-1.patch


 When using a custom load/store function, one that returns complex data (map 
 of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
 script throws an exception of the form -
  
 org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
 bytearray from the UDF. Cannot determine how to convert the bytearray to 
 actual-type
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
 ...
 Looking through the code of POCast, apparently the operator was unable to 
 find the right load function for doing the conversion and consequently bailed 
 out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-14 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12800609#action_12800609
 ] 

Ankur commented on PIG-1191:


Listed below are the identified cases. 

CASE 1: LOAD - FILTER - FOREACH - LIMIT - STORE
===

SCRIPT
---
sds = LOAD '/my/data/location'
  USING my.org.MyMapLoader()
  AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
queries = FILTER sds BY mapFields#'page_params'#'query' is NOT NULL;
queries_rand = FOREACH queries
   GENERATE (CHARARRAY) (mapFields#'page_params'#'query') AS 
query_string;
queries_limit = LIMIT queries_rand 100;
STORE queries_limit INTO 'out'; 

RESULT 

FAILS in reduce stage with the following exception

org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine
how to convert the bytearray to string.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)


CASE 2: LOAD - FOREACH - FILTER - LIMIT - STORE
===
Note that FILTER and FOREACH order is reversed

SCRIPT
---
sds = LOAD '/my/data/location'
  USING my.org.MyMapLoader()
  AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
queries_rand = FOREACH sds
   GENERATE (CHARARRAY) (mapFields#'page_params'#'query') AS 
query_string;
queries = FILTER queries_rand BY query_string IS NOT null;
queries_limit = LIMIT queries 100; 
STORE queries_limit INTO 'out';

RESULT
---
SUCCESS - Results are correctly stored. So if a projection is done before 
FILTER it recieves the LoadFunc in the POCast
operator and everything is cool.


CASE 3: LOAD - FOREACH - FOREACH - FILTER - LIMIT - STORE
==

SCRIPT
---
ds = LOAD '/my/data/location'
  USING my.org.MyMapLoader()
  AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
params = FOREACH sds GENERATE 
  (map[]) (mapFields#'page_params') AS params;
queries = FOREACH params
  GENERATE (CHARARRAY) (params#'query') AS query_string;
queries_filtered = FILTER queries
   BY query_string IS NOT null;
queries_limit = LIMIT queries_filtered 100;
STORE queries_limit INTO 'out';

RESULT
---
FAILS in Map stage. Looks like the 2nd FOREACH did not get the loadFunc and 
bailed out with following stack trace

org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine
how to convert the bytearray to string. at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 at

CASE 4: LOAD - FOREACH - FOREACH - LIMIT - STORE


SCRIPT
---
sds = LOAD '/my/data/location'
  USING my.org.MyMapLoader()
  AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
params = FOREACH sds GENERATE
  (map[]) (mapFields#'page_params') AS params;
queries = FOREACH params