[jira] Updated: (PIG-1574) Optimization rule PushUpFilter causes filter to be pushed up out joins

2010-08-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1574:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

test-patch result:
jira-1574-1.patch

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

This patch does not push filter before join if the join is outer join. Actually 
we can push filter to the outer side of the join. I assume it will be addressed 
in PIG-1575.

Patch jira-1574-1.patch committed. Thanks Xuefu!

 Optimization rule PushUpFilter causes filter to be pushed up out joins
 --

 Key: PIG-1574
 URL: https://issues.apache.org/jira/browse/PIG-1574
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1574-1.patch


 The PushUpFilter optimization rule in the new logical plan moves the filter 
 up to one of the join branch. It does this aggressively by find an operator 
 that has all the projection UIDs. However, it didn't consider that the found 
 operator might be another join. If that join is outer, then we cannot simply 
 move the filter to one of its branches.
 As an example, the following script will be erroneously optimized:
 A = load 'myfile' as (d1:int);
 B = load 'anotherfile' as (d2:int);
 C = join A by d1 full outer, B by d2;
 D = load 'xxx' as (d3:int);
 E = join C by d1, D by d3;
 F = filter E by d1  5;
 G = store F into 'dummy';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1568) Optimization rule FilterAboveForeach is too restrictive and doesn't handle project * correctly

2010-08-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1568:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

test-patch result:

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Patch committed. Thanks Xuefu!

 Optimization rule FilterAboveForeach is too restrictive and doesn't handle 
 project * correctly
 --

 Key: PIG-1568
 URL: https://issues.apache.org/jira/browse/PIG-1568
 Project: Pig
  Issue Type: Bug
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1568-1.patch, jira-1568-1.patch


 FilterAboveForeach rule is to optimize the plan by pushing up filter above 
 previous foreach operator. However, during code review, two major problems 
 were found:
 1. Current implementation assumes that if no projection is found in the 
 filter condition then all columns from foreach are projected. This issue 
 prevents the following optimization:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY 8  5;
   STORE C INTO 'empty';
 2. Current implementation doesn't handle * probjection, which means project 
 all columns. As a result, it wasn't able to optimize the following:
   A = LOAD 'file.txt' AS (a(u,v), b, c);
   B = FOREACH A GENERATE $0, b;
   C = FILTER B BY Identity.class.getName(*)  5;
   STORE C INTO 'empty';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-08-30 Thread Daniel Dai (JIRA)
Intermittent unit test failure for 
TestScriptUDF.testPythonScriptUDFNullInputOutput
---

 Key: PIG-1579
 URL: https://issues.apache.org/jira/browse/PIG-1579
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-08-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1579:


Attachment: PIG-1579-1.patch

Attach a fix. However, this fix is shallow and may need an in-depth look. 
Commit the temporary fix and leave the Jira open.

 Intermittent unit test failure for 
 TestScriptUDF.testPythonScriptUDFNullInputOutput
 ---

 Key: PIG-1579
 URL: https://issues.apache.org/jira/browse/PIG-1579
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1579-1.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1579) Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput

2010-08-30 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1579:


Description: 
Error message:
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error executing 
function: Traceback (most recent call last):
  File iostream, line 5, in multStr
TypeError: can't multiply sequence by non-int of type 'NoneType'

at 
org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


 Intermittent unit test failure for 
 TestScriptUDF.testPythonScriptUDFNullInputOutput
 ---

 Key: PIG-1579
 URL: https://issues.apache.org/jira/browse/PIG-1579
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1579-1.patch


 Error message:
 org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error 
 executing function: Traceback (most recent call last):
   File iostream, line 5, in multStr
 TypeError: can't multiply sequence by non-int of type 'NoneType'
 at 
 org.apache.pig.scripting.jython.JythonFunction.exec(JythonFunction.java:107)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:295)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:346)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1482) Pig gets confused when more than one loader is involved

2010-08-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1482:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to trunk.
Xuefu, thanks for the fix.


 Pig gets confused when more than one loader is involved
 ---

 Key: PIG-1482
 URL: https://issues.apache.org/jira/browse/PIG-1482
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: jira-1482-final-1.patch, jira-1482-final-2.patch, 
 jira-1482-final.patch, jira-1482-final.patch, jira-1482-final.patch


 In case of two relations being loaded using different loader, joined, grouped 
 and projected, pig gets confused in trying to find appropriate loader for the 
 requested cast. Consider the following script :-
 A = LOAD 'data1' USING PigStorage() AS (s, m, l);
 B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
 C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
 :0) as v3:int;
 D = LOAD 'data2' USING TextLoader() AS (a);
 E = JOIN C BY v1, D BY a USING 'replicated';
 F = GROUP E BY (v1, a);
 G = FOREACH F GENERATE (chararray)group.v1, group.a;
 
 dump G;
 This throws the error, stack trace of which is in the next comment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs

2010-08-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904258#action_12904258
 ] 

Thejas M Nair commented on PIG-1570:


Regarding 
bq. Another thing to investigate (somewhat related) - there seems to be a 
problem when PigServer is used to execute query having native mr operator - i 
was unable to run the tests in local mode . But i am able to run query in local 
mode from commandline.

The problem was that in test setup, the MiniCluster hadoop-site.xml 
(~/pigtest/conf/hadoop-site.xml) is in classpath. The WordCount.jar would end 
up trying to run the MR job using minicluster and fail, if rest of the test is 
using local mode.


 native mapreduce operator MR job does not follow same failure handling logic 
 as other pig MR jobs
 -

 Key: PIG-1570
 URL: https://issues.apache.org/jira/browse/PIG-1570
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


 The code path for handling failure in MR job corresponding to native MR is 
 different and does not have the same behavior.
 For example, even if the MR job for mapreduce operator fails, the number of 
 jobs that failed is being reported as 0 in PigStats log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs

2010-08-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1570:
---

Attachment: PIG-1570.1.patch

Patch passed test-patch and core tests. Patch is ready for review.
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 5 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]


 native mapreduce operator MR job does not follow same failure handling logic 
 as other pig MR jobs
 -

 Key: PIG-1570
 URL: https://issues.apache.org/jira/browse/PIG-1570
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1570.1.patch


 The code path for handling failure in MR job corresponding to native MR is 
 different and does not have the same behavior.
 For example, even if the MR job for mapreduce operator fails, the number of 
 jobs that failed is being reported as 0 in PigStats log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs

2010-08-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904263#action_12904263
 ] 

Thejas M Nair commented on PIG-1570:


The code path that is followed in case of the native MR job is still different 
because the jar is a black box, and pig just calls the main function, pig 
doesn't even know if it is a MR job that is actually being run.
This fixes the pig stats reporting (log messages) for failed native MR job and 
also the feature list in the native MR job.


 native mapreduce operator MR job does not follow same failure handling logic 
 as other pig MR jobs
 -

 Key: PIG-1570
 URL: https://issues.apache.org/jira/browse/PIG-1570
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1570.1.patch


 The code path for handling failure in MR job corresponding to native MR is 
 different and does not have the same behavior.
 For example, even if the MR job for mapreduce operator fails, the number of 
 jobs that failed is being reported as 0 in PigStats log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails

2010-08-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904267#action_12904267
 ] 

Richard Ding commented on PIG-1343:
---

Patch is committed to the trunk. Thanks Niraj.

 pig_log file missing even though Main tells it is creating one and an M/R job 
 fails 
 

 Key: PIG-1343
 URL: https://issues.apache.org/jira/browse/PIG-1343
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: 1343.patch, PIG-1343-1.patch, PIG-1343_6.patch, 
 pig_1343_2.patch, pig_1343_4.patch, PIG_1343_5.patch


 There is a particular case where I was running with the latest trunk of Pig.
 {code}
 $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig
 [main] INFO  org.apache.pig.Main - Logging error messages to: 
 /homes/viraj/pig_1263420012601.log
 $ls -l pig_1263420012601.log
 ls: pig_1263420012601.log: No such file or directory
 {code}
 The job failed and the log file did not contain anything, the only way to 
 debug was to look into the Jobtracker logs.
 Here are some reasons which would have caused this behavior:
 1) The underlying filer/NFS had some issues. In that case do we not error on 
 stdout?
 2) There are some errors from the backend which are not being captured
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails

2010-08-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1343:
--

Attachment: PIG-1343_6.patch

 pig_log file missing even though Main tells it is creating one and an M/R job 
 fails 
 

 Key: PIG-1343
 URL: https://issues.apache.org/jira/browse/PIG-1343
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: 1343.patch, PIG-1343-1.patch, PIG-1343_6.patch, 
 pig_1343_2.patch, pig_1343_4.patch, PIG_1343_5.patch


 There is a particular case where I was running with the latest trunk of Pig.
 {code}
 $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig
 [main] INFO  org.apache.pig.Main - Logging error messages to: 
 /homes/viraj/pig_1263420012601.log
 $ls -l pig_1263420012601.log
 ls: pig_1263420012601.log: No such file or directory
 {code}
 The job failed and the log file did not contain anything, the only way to 
 debug was to look into the Jobtracker logs.
 Here are some reasons which would have caused this behavior:
 1) The underlying filer/NFS had some issues. In that case do we not error on 
 stdout?
 2) There are some errors from the backend which are not being captured
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails

2010-08-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1343:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

 pig_log file missing even though Main tells it is creating one and an M/R job 
 fails 
 

 Key: PIG-1343
 URL: https://issues.apache.org/jira/browse/PIG-1343
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: 1343.patch, PIG-1343-1.patch, PIG-1343_6.patch, 
 pig_1343_2.patch, pig_1343_4.patch, PIG_1343_5.patch


 There is a particular case where I was running with the latest trunk of Pig.
 {code}
 $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig
 [main] INFO  org.apache.pig.Main - Logging error messages to: 
 /homes/viraj/pig_1263420012601.log
 $ls -l pig_1263420012601.log
 ls: pig_1263420012601.log: No such file or directory
 {code}
 The job failed and the log file did not contain anything, the only way to 
 debug was to look into the Jobtracker logs.
 Here are some reasons which would have caused this behavior:
 1) The underlying filer/NFS had some issues. In that case do we not error on 
 stdout?
 2) There are some errors from the backend which are not being captured
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1580) new syntax for native mapreduce operator

2010-08-30 Thread Thejas M Nair (JIRA)
new syntax for native mapreduce operator


 Key: PIG-1580
 URL: https://issues.apache.org/jira/browse/PIG-1580
 Project: Pig
  Issue Type: Task
Reporter: Thejas M Nair
Assignee: Thejas M Nair


mapreduce operator (PIG-506) and stream operator have some similarities. It 
makes sense to use a similar syntax for both.

Alan has proposed the following syntax for mapreduce operator, and that we move 
stream operator also to similar a syntax in a future release.

MAPREDUCE id jar
 INPUT  'path' USING LoadFunc  
OUTPUT  'path' USING StoreFunc
[SHIP 'path' [, 'path' ...]]
[CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]]


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1580) new syntax for native mapreduce operator

2010-08-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1580:
---

Fix Version/s: 0.8.0

 new syntax for native mapreduce operator
 

 Key: PIG-1580
 URL: https://issues.apache.org/jira/browse/PIG-1580
 Project: Pig
  Issue Type: Task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


 mapreduce operator (PIG-506) and stream operator have some similarities. It 
 makes sense to use a similar syntax for both.
 Alan has proposed the following syntax for mapreduce operator, and that we 
 move stream operator also to similar a syntax in a future release.
 MAPREDUCE id jar
  INPUT  'path' USING LoadFunc  
 OUTPUT  'path' USING StoreFunc
 [SHIP 'path' [, 'path' ...]]
 [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1581) Parser fails to recognize semicolons in quoted strings

2010-08-30 Thread Christopher Hackman (JIRA)
Parser fails to recognize semicolons in quoted strings
--

 Key: PIG-1581
 URL: https://issues.apache.org/jira/browse/PIG-1581
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.7.0
 Environment: CentOS 5.5
Reporter: Christopher Hackman
Priority: Minor


Within some contexts, the parser fails to treat semicolons correctly, and sees 
them as an EOL.


Given an input file:

/test1.txt (in the hdfs)
1;a
2;b
3;c
4;d
5;e


And the following Pig script:

REGISTER /tmp/piggybank.jar ;
DEFINE REGEXEXTRACTALL 
org.apache.pig.piggybank.evaluation.string.RegexExtractAll();
lines = LOAD '/test1.txt' AS (line:chararray);
delimited = FOREACH lines GENERATE FLATTEN (
REGEXEXTRACTALL(line, '^(\\d+);(\\w+)$')
) AS (
digit:int,
word:chararray
);
DUMP delimited;


I receive the following error:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
Lexical error at line 5, column 40.  Encountered: EOF after : \'^(d+);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1580) new syntax for native mapreduce operator

2010-08-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904298#action_12904298
 ] 

Thejas M Nair commented on PIG-1580:


Updating syntax to include support for parameters -

MAPREDUCE id jar 'params'
INPUT 'path' USING LoadFunc
OUTPUT 'path' USING StoreFunc
[SHIP 'path' [, 'path' ...]]
[CACHE 'dfs_path#dfs_file' , 'dfs_path#dfs_file' ...]

 new syntax for native mapreduce operator
 

 Key: PIG-1580
 URL: https://issues.apache.org/jira/browse/PIG-1580
 Project: Pig
  Issue Type: Task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0


 mapreduce operator (PIG-506) and stream operator have some similarities. It 
 makes sense to use a similar syntax for both.
 Alan has proposed the following syntax for mapreduce operator, and that we 
 move stream operator also to similar a syntax in a future release.
 MAPREDUCE id jar
  INPUT  'path' USING LoadFunc  
 OUTPUT  'path' USING StoreFunc
 [SHIP 'path' [, 'path' ...]]
 [CACHE 'dfs_path#dfs_file' [, 'dfs_path#dfs_file' ...]]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs

2010-08-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904321#action_12904321
 ] 

Richard Ding commented on PIG-1570:
---

+1.

 native mapreduce operator MR job does not follow same failure handling logic 
 as other pig MR jobs
 -

 Key: PIG-1570
 URL: https://issues.apache.org/jira/browse/PIG-1570
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1570.1.patch


 The code path for handling failure in MR job corresponding to native MR is 
 different and does not have the same behavior.
 For example, even if the MR job for mapreduce operator fails, the number of 
 jobs that failed is being reported as 0 in PigStats log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-08-30 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1205:
---

Attachment: PIG_1205_9.patch

Patch with the StoreCaster changes as suggested by Alan. With +1s from Alan and 
Jeff, committing.

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: hbase-0.20.6-test.jar, hbase-0.20.6.jar, PIG_1205.patch, 
 PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch, PIG_1205_5.path, 
 PIG_1205_6.patch, PIG_1205_7.patch, PIG_1205_8.patch, PIG_1205_9.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-08-30 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904325#action_12904325
 ] 

Dmitriy V. Ryaboy commented on PIG-1205:


Re HBASE-1933, they are publishing snapshots of current trunk, not the 0.20 
branch. We'll be able to start using maven to pull down hbase when we upgrade 
to their 0.9 release (which iirc depends on hdfs appends...)

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: hbase-0.20.6-test.jar, hbase-0.20.6.jar, PIG_1205.patch, 
 PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch, PIG_1205_5.path, 
 PIG_1205_6.patch, PIG_1205_7.patch, PIG_1205_8.patch, PIG_1205_9.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1458) aggregate files for replicated join

2010-08-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1458:
--

Attachment: PIG-1458_1.patch

New patch addressing review comments.

 aggregate files for replicated join
 ---

 Key: PIG-1458
 URL: https://issues.apache.org/jira/browse/PIG-1458
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1458.patch, PIG-1458_1.patch


 We have noticed that if the smaller data in replicated join has many files, 
 this puts  unneeded burden on the name node. pre-aggregating the files can 
 improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-08-30 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1205:
---

  Status: Resolved  (was: Patch Available)
Release Note: 
HBaseStorage has been significantly reworked with this release.

Usage:
{code}
my_data = LOAD 'hbase://table_name' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 
colfamily:col2', '-caching 100') as (col1:int, col2:chararray);

STORE my_date INTO 'hbaseL//other_table' USING 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 
colfamily:col2');
{code}

HBaseStorage can now write data into HBase as well as read it. The first 
argument is a space-delimited list of columns to be loaded (or stored). Columns 
are specified as columnfamily:column_name. The second argument is an optional 
set of key-value pairs used to control HBaseStorage behavior. Available 
arguments are:

* {{monospaced}}-loadKey{{monospaced}} Used to load the row key; false by 
default. If true, the first field in the returned tuple will be the value of 
the row key.
* {{monospaced}}-gt, -gte, -lt, and -lte{{monospaced}} Used to specify bounds 
on row keys to be scanned. The keys are specified as binary data, using the hex 
representation. Any slashes have to be double-escaped (two slashes per single 
real slash) to be parsed correctly.
* {{monospaced}}-caching{{monospaced}} Used to specify the number of rows to be 
cached per HBase RPC call. See 
http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#setScannerCaching%28int%29
 for more information about this HBase feature.
* {{monospaced}}-limit{{monospaced}} Used to control how many rows *per scanned 
region* will be retrieved. This can of course speed up processing if you just 
want a few rows. The total number of rows returned will be up to number of 
regions * limit. The limit is applied after any -gt, -lt, etc filters. Pig's 
LIMIT operator can be used in conjunction with this argument.
* {{monospaced}}-caster{{monospaced}} Used to specify a LoadCaster (or 
LoadStoreCaster, for storage) used to convert the data stored in HBase into Pig 
data. By default, the Utf8StorageConverter is used, which stores all data as 
its string representation. The string HBaseBinaryConverter can be used to 
specify that data is stored in HBase's native binary format. Note that the 
HBaseBinary converter does not work with complex data types such as maps, 
tuples, and bags. You can also specify a full class path such as 
org.apache.pig.backend.hadoop.hbase.HBaseBinaryConverter to use your own 
Caster. The default caster can be changed by setting the pig.hbase.caster 
property in pig,properties

HBaseStorage matches column arguments to tuple fields based on their ordinal 
position. When storing, the first field is expected to be the key value.
  Resolution: Fixed

 Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
 --

 Key: PIG-1205
 URL: https://issues.apache.org/jira/browse/PIG-1205
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.7.0
Reporter: Jeff Zhang
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: hbase-0.20.6-test.jar, hbase-0.20.6.jar, PIG_1205.patch, 
 PIG_1205_2.patch, PIG_1205_3.patch, PIG_1205_4.patch, PIG_1205_5.path, 
 PIG_1205_6.patch, PIG_1205_7.patch, PIG_1205_8.patch, PIG_1205_9.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1458) aggregate files for replicated join

2010-08-30 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904346#action_12904346
 ] 

Koji Noguchi commented on PIG-1458:
---

Can we increase the replication to 10 for the aggregated file (if not already 
done)?

 aggregate files for replicated join
 ---

 Key: PIG-1458
 URL: https://issues.apache.org/jira/browse/PIG-1458
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1458.patch, PIG-1458_1.patch


 We have noticed that if the smaller data in replicated join has many files, 
 this puts  unneeded burden on the name node. pre-aggregating the files can 
 improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1399) Logical Optimizer: Expression optimizor rule

2010-08-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904356#action_12904356
 ] 

Alan Gates commented on PIG-1399:
-


{code}
 [exec]
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] -1 findbugs.  The patch appears to introduce 2 new Findbugs 
warnings.
{code}

I'll attach the results of findbugs separately.

 Logical Optimizer: Expression optimizor rule
 

 Key: PIG-1399
 URL: https://issues.apache.org/jira/browse/PIG-1399
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, 
 PIG-1399.patch, PIG-1399.patch, PIG-1399.patch


 We can optimize expression in several ways:
 1. Constant pre-calculation
 Example:
 B = filter A by a0  5+7;
 = B = filter A by a0  12;
 2. Boolean expression optimization
 Example:
 B = filter A by not (not(a05) or a10);
 = B = filter A by a05 and a=10;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1569) java properties not honored in case of properties such as stop.on.failure

2010-08-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1569:
--

Status: Patch Available  (was: Open)

 java properties not honored in case of properties such as stop.on.failure
 -

 Key: PIG-1569
 URL: https://issues.apache.org/jira/browse/PIG-1569
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1569.patch


 In org.apache.pig.Main , properties are being set to default value without 
 checking if the java system properties have been set to something else.
 stop.on.failure, opt.multiquery, aggregate.warning are some properties that 
 have this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1569) java properties not honored in case of properties such as stop.on.failure

2010-08-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1569:
--

Attachment: PIG-1569.patch

 java properties not honored in case of properties such as stop.on.failure
 -

 Key: PIG-1569
 URL: https://issues.apache.org/jira/browse/PIG-1569
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1569.patch


 In org.apache.pig.Main , properties are being set to default value without 
 checking if the java system properties have been set to something else.
 stop.on.failure, opt.multiquery, aggregate.warning are some properties that 
 have this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1399) Logical Optimizer: Expression optimizor rule

2010-08-30 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1399:


Attachment: newPatchFindbugsWarnings.html

Results of findbugs from manual run of test-patch

 Logical Optimizer: Expression optimizor rule
 

 Key: PIG-1399
 URL: https://issues.apache.org/jira/browse/PIG-1399
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: newPatchFindbugsWarnings.html, PIG-1399.patch, 
 PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch


 We can optimize expression in several ways:
 1. Constant pre-calculation
 Example:
 B = filter A by a0  5+7;
 = B = filter A by a0  12;
 2. Boolean expression optimization
 Example:
 B = filter A by not (not(a05) or a10);
 = B = filter A by a05 and a=10;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1458) aggregate files for replicated join

2010-08-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904358#action_12904358
 ] 

Thejas M Nair commented on PIG-1458:


+1

 aggregate files for replicated join
 ---

 Key: PIG-1458
 URL: https://issues.apache.org/jira/browse/PIG-1458
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1458.patch, PIG-1458_1.patch


 We have noticed that if the smaller data in replicated join has many files, 
 this puts  unneeded burden on the name node. pre-aggregating the files can 
 improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1399) Logical Optimizer: Expression optimizor rule

2010-08-30 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1399:
--

Attachment: PIG-1399.patch

I use findbugs 1.3.9 and it finds the patch clean. The attached findbugs 
results were generated using 1.3.8, it might be the difference. Anyways, I make 
a minor modification that should fix the warnings by 1.3.8.

 Logical Optimizer: Expression optimizor rule
 

 Key: PIG-1399
 URL: https://issues.apache.org/jira/browse/PIG-1399
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: newPatchFindbugsWarnings.html, PIG-1399.patch, 
 PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, 
 PIG-1399.patch, PIG-1399.patch


 We can optimize expression in several ways:
 1. Constant pre-calculation
 Example:
 B = filter A by a0  5+7;
 = B = filter A by a0  12;
 2. Boolean expression optimization
 Example:
 B = filter A by not (not(a05) or a10);
 = B = filter A by a05 and a=10;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1569) java properties not honored in case of properties such as stop.on.failure

2010-08-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904369#action_12904369
 ] 

Thejas M Nair commented on PIG-1569:


looks good. +1 

 java properties not honored in case of properties such as stop.on.failure
 -

 Key: PIG-1569
 URL: https://issues.apache.org/jira/browse/PIG-1569
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1569.patch


 In org.apache.pig.Main , properties are being set to default value without 
 checking if the java system properties have been set to something else.
 stop.on.failure, opt.multiquery, aggregate.warning are some properties that 
 have this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1399) Logical Optimizer: Expression optimizor rule

2010-08-30 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1399:
--

  Status: Patch Available  (was: Open)
Release Note: 
This logical simplification contains the following types of simplifications:

1) Constant pre-calculation
Example:
B = filter A by a0  5+7;

is simplified to

B = filter A by a0  12;


2) Elimination of negations
Example:
B = filter A by not (not(a05) or a10);

is simplified to

B = filter A by a05 and a=10;


3) Elimination of logical implied expression in AND
Example:
B = filter A by (a0  5 and a0  7);


is simplified to

B = filter A by a0  7;


4) Elimination of logical implied expression in OR
Example:
B = filter A by ((a0  5) or (a0  6 and a1  15);

is simplified to
B = filter C by a0  5;


5) Equivalence elimination
Example:
B = filter A by (a0  5 and a0  5);

is simplified to

B = filter A by a0  5;


6) Elimination of complementary expressions in OR
Example:
B = filter A by (a0  5 OR a0 = 5);

is simplified to non-filtering


7) Elimination of naive TRUE expression
Example:

B = filter A by 1==1;

is simplified to non-filtering

 Logical Optimizer: Expression optimizor rule
 

 Key: PIG-1399
 URL: https://issues.apache.org/jira/browse/PIG-1399
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: newPatchFindbugsWarnings.html, PIG-1399.patch, 
 PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, 
 PIG-1399.patch, PIG-1399.patch


 We can optimize expression in several ways:
 1. Constant pre-calculation
 Example:
 B = filter A by a0  5+7;
 = B = filter A by a0  12;
 2. Boolean expression optimization
 Example:
 B = filter A by not (not(a05) or a10);
 = B = filter A by a05 and a=10;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1458) aggregate files for replicated join

2010-08-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904385#action_12904385
 ] 

Richard Ding commented on PIG-1458:
---

Koji,

Please open a jira on increasing the replication factor of the replicated 
files. Now it uses the default replication factor. 

Thanks,
-Richard 

 aggregate files for replicated join
 ---

 Key: PIG-1458
 URL: https://issues.apache.org/jira/browse/PIG-1458
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1458.patch, PIG-1458_1.patch


 We have noticed that if the smaller data in replicated join has many files, 
 this puts  unneeded burden on the name node. pre-aggregating the files can 
 improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1569) java properties not honored in case of properties such as stop.on.failure

2010-08-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1569:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

 java properties not honored in case of properties such as stop.on.failure
 -

 Key: PIG-1569
 URL: https://issues.apache.org/jira/browse/PIG-1569
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1569.patch


 In org.apache.pig.Main , properties are being set to default value without 
 checking if the java system properties have been set to something else.
 stop.on.failure, opt.multiquery, aggregate.warning are some properties that 
 have this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1572) change default datatype when relations are used as scalar to bytearray

2010-08-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1572:
---

Attachment: PIG-1572.1.patch

Summary of changes
- Changed default type (ie type when input relation to scalar has not type) to 
bytearray.
- Replaced PigStorage with InterStorage for load/store of scalar data, so typed 
data is stored.
- Changes to track lineage of the ReadScalars udf to the load function(s).
- Removed unnecessary casts on output of ReadScalars
- describe alias; PigServer code now checks the alias of the leaf logical 
operators 
- Changed test cases - explicit cast no longer required when bytearray is used 
in arithmetic operations. Moved some of the tests to local mode to reduce test 
run time.


 change default datatype when relations are used as scalar to bytearray
 --

 Key: PIG-1572
 URL: https://issues.apache.org/jira/browse/PIG-1572
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1572.1.patch


 When relations are cast to scalar, the current default type is chararray. 
 This is inconsistent with the behavior in rest of pig-latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1570) native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs

2010-08-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-1570.


Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to trunk.

 native mapreduce operator MR job does not follow same failure handling logic 
 as other pig MR jobs
 -

 Key: PIG-1570
 URL: https://issues.apache.org/jira/browse/PIG-1570
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1570.1.patch


 The code path for handling failure in MR job corresponding to native MR is 
 different and does not have the same behavior.
 For example, even if the MR job for mapreduce operator fails, the number of 
 jobs that failed is being reported as 0 in PigStats log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1458) aggregate files for replicated join

2010-08-30 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-1458.
---

Hadoop Flags: [Reviewed]
  Resolution: Fixed

 aggregate files for replicated join
 ---

 Key: PIG-1458
 URL: https://issues.apache.org/jira/browse/PIG-1458
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1458.patch, PIG-1458_1.patch


 We have noticed that if the smaller data in replicated join has many files, 
 this puts  unneeded burden on the name node. pre-aggregating the files can 
 improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) SUBSTRING function is broken

2010-08-30 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904450#action_12904450
 ] 

Olga Natkovich commented on PIG-1563:
-

Dmitry, thanks for the review. I did not discard your function - it was part of 
the patch. I did not change the code to use it just because I already finished 
testing the changes and did not have time to redo the code.

I am fixing some javadoc and release audit failures and will commit the code 
shortly.

 SUBSTRING function is broken
 

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1458) aggregate files for replicated join

2010-08-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904451#action_12904451
 ] 

Richard Ding commented on PIG-1458:
---

Patch committed to trunk.

 aggregate files for replicated join
 ---

 Key: PIG-1458
 URL: https://issues.apache.org/jira/browse/PIG-1458
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1458.patch, PIG-1458_1.patch


 We have noticed that if the smaller data in replicated join has many files, 
 this puts  unneeded burden on the name node. pre-aggregating the files can 
 improve the situation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1483) [piggybank] Add HadoopJobHistoryLoader to the piggybank

2010-08-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904453#action_12904453
 ] 

Richard Ding commented on PIG-1483:
---

Patch committed to trunk.

 [piggybank] Add HadoopJobHistoryLoader to the piggybank
 ---

 Key: PIG-1483
 URL: https://issues.apache.org/jira/browse/PIG-1483
 Project: Pig
  Issue Type: New Feature
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1483.patch, PIG-1483_1.patch


 PIG-1333 added many script-related entries to the MR job xml file and thus 
 it's now possible to use Pig for querying Hadoop job history/xml files to get 
 script-level usage statistics. What we need is a Pig loader that can parse 
 these files and generate corresponding data objects.
 The goal of this jira is to create a HadoopJobHistoryLoader in piggybank.
 Here is an example that shows the intended usage:
 *Find all the jobs grouped by script and user:*
 {code}
 a = load '/mapred/history/_logs/history/' using HadoopJobHistoryLoader() as 
 (j:map[], m:map[], r:map[]);
 b = foreach a generate (Chararray) j#'PIG_SCRIPT_ID' as id, (Chararray) 
 j#'USER' as user, (Chararray) j#'JOBID' as job; 
 c = filter b by not (id is null);
 d = group c by (id, user);
 e = foreach d generate flatten(group), c.job;
 dump e;
 {code}
 A couple more examples:
 *Find scripts that use only the default parallelism:*
 {code}
 a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], 
 m:map[], r:map[]);
 b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' 
 as script_name, (Long) r#'NUMBER_REDUCES' as reduces;
 c = group b by (id, user, script_name) parallel 10;
 d = foreach c generate group.user, group.script_name, MAX(b.reduces) as 
 max_reduces;
 e = filter d by max_reduces == 1;
 dump e;
 {code}
 *Find the running time of each script (in seconds):*
 {code}
 a = load '/mapred/history/done' using HadoopJobHistoryLoader() as (j:map[], 
 m:map[], r:map[]);
 b = foreach a generate j#'PIG_SCRIPT_ID' as id, j#'USER' as user, j#'JOBNAME' 
 as script_name, (Long) j#'SUBMIT_TIME' as start, (Long) j#'FINISH_TIME' as 
 end;
 c = group b by (id, user, script_name)
 d = foreach c generate group.user, group.script_name, (MAX(b.end) - 
 MIN(b.start)/1000;
 dump d;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1557) couple of issue mapping aliases to jobs

2010-08-30 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904456#action_12904456
 ] 

Richard Ding commented on PIG-1557:
---

Patch committed to trunk.

 couple of issue mapping aliases to jobs
 ---

 Key: PIG-1557
 URL: https://issues.apache.org/jira/browse/PIG-1557
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1557.patch, PIG-1557_1.patch


 I have a simple script:
 A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
 B = group A by name;
 C = foreach B generate group, COUNT(A);
 D = order C by $1;
 E = limit D 10;
 dump E;
 I noticed a couple of issues with alias to job mapping: neither load(A) nor 
 limit(E) shows in the output

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) Some string functions don't work with bytearray arguments

2010-08-30 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904462#action_12904462
 ] 

Olga Natkovich commented on PIG-1563:
-

 +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 13 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]


 Some string functions don't work with bytearray arguments
 -

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1563) Some string functions don't work with bytearray arguments

2010-08-30 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904467#action_12904467
 ] 

Olga Natkovich commented on PIG-1563:
-

I made one additional change and renamed SPLIT into STRSPLIT to avoid conflict 
with SPLIT operator

 Some string functions don't work with bytearray arguments
 -

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1563) Some string functions don't work with bytearray arguments

2010-08-30 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1563:


Attachment: PIG_1563_v3.patch

latest patch

 Some string functions don't work with bytearray arguments
 -

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch, PIG_1563_v3.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1563) Some string functions don't work with bytearray arguments

2010-08-30 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1563:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

patch committed. Thanks Dmitry for the help and review

 Some string functions don't work with bytearray arguments
 -

 Key: PIG-1563
 URL: https://issues.apache.org/jira/browse/PIG-1563
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: PIG_1563.patch, PIG_1563_v2.patch, PIG_1563_v3.patch


 Script:
 A = load 'studenttab10k' as (name, age, gpa);
 C = foreach A generate SUBSTRING(name, 0,5);
 E = limit C 10;
 dump E;
 Output is always empty:
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()
 ()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1531) Pig gobbles up error messages

2010-08-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904497#action_12904497
 ] 

Ashutosh Chauhan commented on PIG-1531:
---

Niraj ran all the unit tests. All passed. No complaints from test-patch either. 
Committed to the trunk.
Thanks, Niraj !

 Pig gobbles up error messages
 -

 Key: PIG-1531
 URL: https://issues.apache.org/jira/browse/PIG-1531
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: pig-1531_3.patch, PIG_1531.patch, PIG_1531_2.patch


 Consider the following. I have my own Storer implementing StoreFunc and I am 
 throwing FrontEndException (and other Exceptions derived from PigException) 
 in its various methods. I expect those error messages to be shown in error 
 scenarios. Instead Pig gobbles up my error messages and shows its own generic 
 error message like: 
 {code}
 010-07-31 14:14:25,414 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2116: Unexpected error. Could not validate the output specification for: 
 default.partitoned
 Details at logfile: /Users/ashutosh/workspace/pig/pig_1280610650690.log
 {code}
 Instead I expect it to display my error messages which it stores away in that 
 log file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.