[jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

2010-07-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884551#action_12884551
 ] 

Ashutosh Chauhan commented on PIG-1449:
---

Reran the contrib tests. All passed. Patch committed. Thanks, Christian and 
Justin for working on this !

 RegExLoader hangs on lines that don't match the regular expression
 --

 Key: PIG-1449
 URL: https://issues.apache.org/jira/browse/PIG-1449
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Sanders
Priority: Minor
 Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, 
 RegExLoader.patch


 In the 0.7.0 changes to RegExLoader there was a bug introduced where the code 
 will stay in the while loop if the line isn't matched.  Before 0.7.0 these 
 lines would be skipped if they didn't match the regular expression.  The 
 result is the mapper will not respond and will time out with Task attempt_X 
 failed to report status for 600 seconds. Killing!.
 Here are the steps to recreate the bug:
 Create a text file in HDFS with the following lines:
 test1
 testA
 test2
 Run the following pig script:
 REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
 test = LOAD '/path/to/test.txt' using 
 org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
 dump test;
 Expected result:
 (test1)
 (test3)
 Actual result:
 Job fails to complete after 600 second timeout waiting on the mapper to 
 complete.  The mapper hangs at 33% since it can process the first line but 
 gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

2010-07-02 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1449:
--

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.8.0
   Resolution: Fixed

 RegExLoader hangs on lines that don't match the regular expression
 --

 Key: PIG-1449
 URL: https://issues.apache.org/jira/browse/PIG-1449
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Sanders
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, 
 RegExLoader.patch


 In the 0.7.0 changes to RegExLoader there was a bug introduced where the code 
 will stay in the while loop if the line isn't matched.  Before 0.7.0 these 
 lines would be skipped if they didn't match the regular expression.  The 
 result is the mapper will not respond and will time out with Task attempt_X 
 failed to report status for 600 seconds. Killing!.
 Here are the steps to recreate the bug:
 Create a text file in HDFS with the following lines:
 test1
 testA
 test2
 Run the following pig script:
 REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
 test = LOAD '/path/to/test.txt' using 
 org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
 dump test;
 Expected result:
 (test1)
 (test3)
 Actual result:
 Job fails to complete after 600 second timeout waiting on the mapper to 
 complete.  The mapper hangs at 33% since it can process the first line but 
 gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1480) An object oriented Java API for Pig statements

2010-07-02 Thread Julien Le Dem (JIRA)
An object oriented Java API for Pig statements
--

 Key: PIG-1480
 URL: https://issues.apache.org/jira/browse/PIG-1480
 Project: Pig
  Issue Type: New Feature
Reporter: Julien Le Dem


A java API for Pig statements would enable third party libraries to generate 
Pig scripts much easier than it is actually.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1478) Add progress notification listener to PigRunner API

2010-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884554#action_12884554
 ] 

Hadoop QA commented on PIG-1478:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch
  against trunk revision 958666.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/336/console

This message is automatically generated.

 Add progress notification listener to PigRunner API
 ---

 Key: PIG-1478
 URL: https://issues.apache.org/jira/browse/PIG-1478
 Project: Pig
  Issue Type: Improvement
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1478.patch


 PIG-1333 added PigRunner API to allow Pig users and tools to get a 
 status/stats object back after executing a Pig script. The new API, however, 
 is synchronous (blocking). It's known that a Pig script can spawn tens (even 
 hundreds) MR jobs and take hours to complete. Therefore it'll be nice to give 
 progress feedback to the callers during the execution.
 The proposal is to add an optional parameter to the API:
 {code}
 public abstract class PigRunner {
 public static PigStats run(String[] args, PigProgressNotificationListener 
 listener) {...}
 }
 {code} 
 The new listener is defined as following:
 {code}
 package org.apache.pig.tools.pigstats;
 public interface PigProgressNotificationListener extends 
 java.util.EventListener {
 // just before the launch of MR jobs for the script
 public void LaunchStartedNotification(int numJobsToLaunch);
 // number of jobs submitted in a batch
 public void jobsSubmittedNotification(int numJobsSubmitted);
 // a job is started
 public void jobStartedNotification(String assignedJobId);
 // a job is completed successfully
 public void jobFinishedNotification(JobStats jobStats);
 // a job is failed
 public void jobFailedNotification(JobStats jobStats);
 // a user output is completed successfully
 public void outputCompletedNotification(OutputStats outputStats);
 // updates the progress as percentage
 public void progressUpdatedNotification(int progress);
 // the script execution is done
 public void launchCompletedNotification(int numJobsSucceeded);
 }
 {code}
 Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations

2010-07-02 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884557#action_12884557
 ] 

Russell Jurney commented on PIG-1430:
-

I've been thinking about the feedback at the contributors meeting Monday. I 
propose that we postpone the addition of a full datetime PIG-1314 type in lieu 
of the builtins described below. This change is easy and I can do it 
immediately and get it in 0.8. The original proposal is quite hard, and I can't 
really estimate when I could have it completed. I'm not sure we need it. There 
are many other more important things I would rather do. 

I'd like to remove the piggybank classes 
org.apache.pig.piggybank.evaluation.datetime.* or at least deprecate them. 

I'd like to add the following builtins, which act on both ISO8601 datetime 
strings and long unix times. These could be made into many functions each, but 
I'd prefer to keep them as short as possible. I suggest we mirror the oracle 
date/time functions when possible: http://psoug.org/reference/date_func.html 

* Units 

When listed below, units are defined as one of: 

YEAR 
MONTH 
WEEK 
DAY 
HOUR 
MINUTE 
SECOND 

* Truncations 

TRUNC(date, unit) or TRUNC_DATE(date, unit) 

long/epoch input returns long/epoch output. 
ISO8601 string input returns IS08601 datetime output. 

* Dates to durations 

DURATION(date, unit) 

long/epoch input returns long output in the unit specified. 
ISO8601 input returns an ISO8601 duration 

* Adding/subtracting durations and dates: use longs. 

* Utilities 

CURRENT_ISOTIME 
CURRENT_UNIXTIME 
ISOTOUNIX 
UNIXTOISO 

The only ugly part to this is that ISO times are 2nd class citizens in that 
they cannot be added/subtracted. I'm prepared to live with that :)

 ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix 
 Times in All Operations
 --

 Key: PIG-1430
 URL: https://issues.apache.org/jira/browse/PIG-1430
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0


 All functions in 
 contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime
  should seamlessly accept integer Unix/POSIX times, and return Unix time 
 output when given an int, and ISO output when given a chararray.
 Note: Unix/POSIX times are the number of seconds elapsed since midnight 
 proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting 
 leap seconds.  See http://en.wikipedia.org/wiki/Unix_time

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

2010-07-02 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884552#action_12884552
 ] 

Ashutosh Chauhan commented on PIG-1449:
---

@Christian,

It would definitely be useful to get the execution time for running the tests 
down. It takes a while currently to run all Pig tests.

 RegExLoader hangs on lines that don't match the regular expression
 --

 Key: PIG-1449
 URL: https://issues.apache.org/jira/browse/PIG-1449
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Sanders
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, 
 RegExLoader.patch


 In the 0.7.0 changes to RegExLoader there was a bug introduced where the code 
 will stay in the while loop if the line isn't matched.  Before 0.7.0 these 
 lines would be skipped if they didn't match the regular expression.  The 
 result is the mapper will not respond and will time out with Task attempt_X 
 failed to report status for 600 seconds. Killing!.
 Here are the steps to recreate the bug:
 Create a text file in HDFS with the following lines:
 test1
 testA
 test2
 Run the following pig script:
 REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
 test = LOAD '/path/to/test.txt' using 
 org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
 dump test;
 Expected result:
 (test1)
 (test3)
 Actual result:
 Job fails to complete after 600 second timeout waiting on the mapper to 
 complete.  The mapper hangs at 33% since it can process the first line but 
 gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-07-02 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884558#action_12884558
 ] 

Russell Jurney commented on PIG-1314:
-

Been thinking about this... I don't think we should add a full datetime type at 
this time.  See comments in PIG-1314 on alternative approach using builtins.

 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1480) An object oriented Java API for Pig statements

2010-07-02 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-1480:
---

Attachment: java-api.zip

Here is an initial prototype of how the API looks like.
To compile and run, add pig-0.6.0-core.jar to the classpath.
see the example of usage in org.apache.pig.api.example.TransitiveClosure
Generated Pig scripts are dumped on stdout while executing them.

It is not complete and runs only in local mode but is meant only as an 
illustration.
The real implementation would create Pig operator objects instead of Pig latin 
statements.

 An object oriented Java API for Pig statements
 --

 Key: PIG-1480
 URL: https://issues.apache.org/jira/browse/PIG-1480
 Project: Pig
  Issue Type: New Feature
Reporter: Julien Le Dem
 Attachments: java-api.zip


 A java API for Pig statements would enable third party libraries to generate 
 Pig scripts much easier than it is actually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-07-02 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884562#action_12884562
 ] 

Russell Jurney commented on PIG-1314:
-

I suck at JIRA. See proposal in PIG-1430.



 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-794) Use Avro serialization in Pig

2010-07-02 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884577#action_12884577
 ] 

Jeff Zhang commented on PIG-794:


We can leverage  AvroInputFormat and AvroOutputFormat in Avro trunk, (see 
AVRO-493)

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
Assignee: Dmitriy V. Ryaboy
 Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar, PIG-794.patch


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1478) Add progress notification listener to PigRunner API

2010-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884677#action_12884677
 ] 

Hadoop QA commented on PIG-1478:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch
  against trunk revision 959865.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/358/console

This message is automatically generated.

 Add progress notification listener to PigRunner API
 ---

 Key: PIG-1478
 URL: https://issues.apache.org/jira/browse/PIG-1478
 Project: Pig
  Issue Type: Improvement
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1478.patch


 PIG-1333 added PigRunner API to allow Pig users and tools to get a 
 status/stats object back after executing a Pig script. The new API, however, 
 is synchronous (blocking). It's known that a Pig script can spawn tens (even 
 hundreds) MR jobs and take hours to complete. Therefore it'll be nice to give 
 progress feedback to the callers during the execution.
 The proposal is to add an optional parameter to the API:
 {code}
 public abstract class PigRunner {
 public static PigStats run(String[] args, PigProgressNotificationListener 
 listener) {...}
 }
 {code} 
 The new listener is defined as following:
 {code}
 package org.apache.pig.tools.pigstats;
 public interface PigProgressNotificationListener extends 
 java.util.EventListener {
 // just before the launch of MR jobs for the script
 public void LaunchStartedNotification(int numJobsToLaunch);
 // number of jobs submitted in a batch
 public void jobsSubmittedNotification(int numJobsSubmitted);
 // a job is started
 public void jobStartedNotification(String assignedJobId);
 // a job is completed successfully
 public void jobFinishedNotification(JobStats jobStats);
 // a job is failed
 public void jobFailedNotification(JobStats jobStats);
 // a user output is completed successfully
 public void outputCompletedNotification(OutputStats outputStats);
 // updates the progress as percentage
 public void progressUpdatedNotification(int progress);
 // the script execution is done
 public void launchCompletedNotification(int numJobsSucceeded);
 }
 {code}
 Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1478) Add progress notification listener to PigRunner API

2010-07-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884732#action_12884732
 ] 

Hadoop QA commented on PIG-1478:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448532/PIG-1478.patch
  against trunk revision 959865.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/337/console

This message is automatically generated.

 Add progress notification listener to PigRunner API
 ---

 Key: PIG-1478
 URL: https://issues.apache.org/jira/browse/PIG-1478
 Project: Pig
  Issue Type: Improvement
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1478.patch


 PIG-1333 added PigRunner API to allow Pig users and tools to get a 
 status/stats object back after executing a Pig script. The new API, however, 
 is synchronous (blocking). It's known that a Pig script can spawn tens (even 
 hundreds) MR jobs and take hours to complete. Therefore it'll be nice to give 
 progress feedback to the callers during the execution.
 The proposal is to add an optional parameter to the API:
 {code}
 public abstract class PigRunner {
 public static PigStats run(String[] args, PigProgressNotificationListener 
 listener) {...}
 }
 {code} 
 The new listener is defined as following:
 {code}
 package org.apache.pig.tools.pigstats;
 public interface PigProgressNotificationListener extends 
 java.util.EventListener {
 // just before the launch of MR jobs for the script
 public void LaunchStartedNotification(int numJobsToLaunch);
 // number of jobs submitted in a batch
 public void jobsSubmittedNotification(int numJobsSubmitted);
 // a job is started
 public void jobStartedNotification(String assignedJobId);
 // a job is completed successfully
 public void jobFinishedNotification(JobStats jobStats);
 // a job is failed
 public void jobFailedNotification(JobStats jobStats);
 // a user output is completed successfully
 public void outputCompletedNotification(OutputStats outputStats);
 // updates the progress as percentage
 public void progressUpdatedNotification(int progress);
 // the script execution is done
 public void launchCompletedNotification(int numJobsSucceeded);
 }
 {code}
 Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

2010-07-02 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy closed PIG-1449.
--


 RegExLoader hangs on lines that don't match the regular expression
 --

 Key: PIG-1449
 URL: https://issues.apache.org/jira/browse/PIG-1449
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Sanders
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, 
 RegExLoader.patch


 In the 0.7.0 changes to RegExLoader there was a bug introduced where the code 
 will stay in the while loop if the line isn't matched.  Before 0.7.0 these 
 lines would be skipped if they didn't match the regular expression.  The 
 result is the mapper will not respond and will time out with Task attempt_X 
 failed to report status for 600 seconds. Killing!.
 Here are the steps to recreate the bug:
 Create a text file in HDFS with the following lines:
 test1
 testA
 test2
 Run the following pig script:
 REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
 test = LOAD '/path/to/test.txt' using 
 org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
 dump test;
 Expected result:
 (test1)
 (test3)
 Actual result:
 Job fails to complete after 600 second timeout waiting on the mapper to 
 complete.  The mapper hangs at 33% since it can process the first line but 
 gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1469) DefaultDataBag assumes ArrayList as default List type

2010-07-02 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1469:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I committed this.

 DefaultDataBag assumes ArrayList as default List type
 -

 Key: PIG-1469
 URL: https://issues.apache.org/jira/browse/PIG-1469
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.8.0
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales
 Fix For: 0.8.0

 Attachments: PIG-1469.patch


 In org.apache.pig.data.DefaultDataBag, the field mContents is assumed to be 
 of type ArrayList but the user can actually pass a different List to the 
 constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-02 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884759#action_12884759
 ] 

Richard Ding commented on PIG-1434:
---

I agree that we should use the right syntax. What I meant was that it can be 
implemented as a 'replicated' cross which seems to solve the problems of 
implicit dependency and using distributed cache.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884763#action_12884763
 ] 

Dmitriy V. Ryaboy commented on PIG-928:
---

Aniket, the patch does not apply cleanly to trunk, can you rebase it? 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-928:
--

Attachment: PIG-928.patch

I rebased the patch and made it pull jython down via maven. 2.5.1 doesn't 
appear to be available right now, so this pulls down 2.5.0. Hope that's ok.

Looks like the tabulation is wrong in most of this patch.. someone please hit 
ctrl-a, ctrl-i next time :).

Needless to say, this thing needs tests, desperately.

Also imho in order for it to make it into trunk, it should be a compile-time 
option to support (and pull down) jython or jruby or whatnot, not a default 
option. Otherwise we are well on our way to making people pull down the 
internet in order to compile pig.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1309) Map-side Cogroup

2010-07-02 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1309:
--

Attachment: PIG_1309_7.patch

Backport of merge cogroup for 0.7 branch. Since, hudson can test only for 
trunk. Manually ran all the tests, all passed.

 Map-side Cogroup
 

 Key: PIG-1309
 URL: https://issues.apache.org/jira/browse/PIG-1309
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch, 
 PIG_1309_7.patch


 In never ending quest to make Pig go faster, we want to parallelize as many 
 relational operations as possible. Its already possible to do Group-by( 
 PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira 
 is to add map-side implementation of Cogroup in Pig. Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-02 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884791#action_12884791
 ] 

Thejas M Nair commented on PIG-1434:


I think the replicated cross is a good alternative to this feature, though this 
feature is probably more friendly for a beginner pig user. But if this feature 
makes the pig code very complicated/hacky (the dependency order and stuff), I 
think it might not be a bad idea to encourage the use of replicated-join 
instead .

As a side note, we can actually get 'replicated cross' working using replicated 
join -
eg -
{code}
 j = join l1 by 1, l2 by 1 using 'replicated';
{code}

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-02 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884794#action_12884794
 ] 

Richard Ding commented on PIG-1434:
---


So all one needs to do is internally replace the line: 

{code}
Y = foreach X generate $1/(long) C.count, $2-(long) C.max;
{code}

with

{code}
Z = join X by 1, C by 1 using 'replicated';
Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max;
{code}



 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1321) Logical Optimizer: Merge cascading foreach

2010-07-02 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884811#action_12884811
 ] 

Xuefu Zhang commented on PIG-1321:
--

Add one more pre-condition:
3. The first foreach statement cannot contain flatten due to its complexity.

 Logical Optimizer: Merge cascading foreach
 --

 Key: PIG-1321
 URL: https://issues.apache.org/jira/browse/PIG-1321
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang

 We can merge consecutive foreach statement.
 Eg:
 b = foreach a generate a0#'key1' as b0, a0#'key2' as b1, a1;
 c = foreach b generate b0#'kk1', b0#'kk2', b1, a1;
 = c = foreach a generate a0#'key1'#'kk1', a0#'key1'#'kk2', a0#'key2', a1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1404) PigUnit - Pig script testing simplified.

2010-07-02 Thread Romain Rigaux (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain Rigaux updated PIG-1404:
---

Status: Open  (was: Patch Available)

 PigUnit - Pig script testing simplified. 
 -

 Key: PIG-1404
 URL: https://issues.apache.org/jira/browse/PIG-1404
 Project: Pig
  Issue Type: New Feature
Reporter: Romain Rigaux
Assignee: Romain Rigaux
 Fix For: 0.8.0

 Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, 
 PIG-1404-3-doc.patch, PIG-1404-3.patch, PIG-1404.patch


 The goal is to provide a simple xUnit framework that enables our Pig scripts 
 to be easily:
   - unit tested
   - regression tested
   - quickly prototyped
 No cluster set up is required.
 For example:
 TestCase
 {code}
   @Test
   public void testTop3Queries() {
 String[] args = {
 n=3,
 };
 test = new PigTest(top_queries.pig, args);
 String[] input = {
 yahoo\t10,
 twitter\t7,
 facebook\t10,
 yahoo\t15,
 facebook\t5,
 
 };
 String[] output = {
 (yahoo,25L),
 (facebook,15L),
 (twitter,7L),
 };
 test.assertOutput(data, input, queries_limit, output);
   }
 {code}
 top_queries.pig
 {code}
 data =
 LOAD '$input'
 AS (query:CHARARRAY, count:INT);
  
 ... 
 
 queries_sum = 
 FOREACH queries_group 
 GENERATE 
 group AS query, 
 SUM(queries.count) AS count;
 
 ...
 
 queries_limit = LIMIT queries_ordered $n;
 STORE queries_limit INTO '$output';
 {code}
 They are 3 modes:
 * LOCAL (if pigunit.exectype.local properties is present)
 * MAPREDUCE (use the cluster specified in the classpath, same as 
 HADOOP_CONF_DIR)
 ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in 
 the class path will be: ~/pigtest/conf)
 ** pointing to an existing cluster (if pigunit.exectype.cluster properties 
 is present)
 For now, it would be nice to see how this idea could be integrated in 
 Piggybank and if PigParser/PigServer could improve their interfaces in order 
 to make PigUnit simple.
 Other components based on PigUnit could be built later:
   - standalone MiniCluster
   - notion of workspaces for each test
   - standalone utility that reads test configuration and generates a test 
 report...
 It is a first prototype, open to suggestions and can definitely take 
 advantage of feedbacks.
 How to test, in pig_trunk:
 {code}
 Apply patch
 $pig_trunk ant compile-test
 $pig_trunk ant
 $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=99
 {code}
 (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
 future between 'unit' and 'integration')
 Many examples are in:
 {code}
 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
 {code}
 When used as a standalone, do not forget commons-lang-2.4.jar and the 
 HADOOP_CONF_DIR to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1404) PigUnit - Pig script testing simplified.

2010-07-02 Thread Romain Rigaux (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Romain Rigaux updated PIG-1404:
---

Status: Patch Available  (was: Open)

 PigUnit - Pig script testing simplified. 
 -

 Key: PIG-1404
 URL: https://issues.apache.org/jira/browse/PIG-1404
 Project: Pig
  Issue Type: New Feature
Reporter: Romain Rigaux
Assignee: Romain Rigaux
 Fix For: 0.8.0

 Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, 
 PIG-1404-3-doc.patch, PIG-1404-3.patch, PIG-1404.patch


 The goal is to provide a simple xUnit framework that enables our Pig scripts 
 to be easily:
   - unit tested
   - regression tested
   - quickly prototyped
 No cluster set up is required.
 For example:
 TestCase
 {code}
   @Test
   public void testTop3Queries() {
 String[] args = {
 n=3,
 };
 test = new PigTest(top_queries.pig, args);
 String[] input = {
 yahoo\t10,
 twitter\t7,
 facebook\t10,
 yahoo\t15,
 facebook\t5,
 
 };
 String[] output = {
 (yahoo,25L),
 (facebook,15L),
 (twitter,7L),
 };
 test.assertOutput(data, input, queries_limit, output);
   }
 {code}
 top_queries.pig
 {code}
 data =
 LOAD '$input'
 AS (query:CHARARRAY, count:INT);
  
 ... 
 
 queries_sum = 
 FOREACH queries_group 
 GENERATE 
 group AS query, 
 SUM(queries.count) AS count;
 
 ...
 
 queries_limit = LIMIT queries_ordered $n;
 STORE queries_limit INTO '$output';
 {code}
 They are 3 modes:
 * LOCAL (if pigunit.exectype.local properties is present)
 * MAPREDUCE (use the cluster specified in the classpath, same as 
 HADOOP_CONF_DIR)
 ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in 
 the class path will be: ~/pigtest/conf)
 ** pointing to an existing cluster (if pigunit.exectype.cluster properties 
 is present)
 For now, it would be nice to see how this idea could be integrated in 
 Piggybank and if PigParser/PigServer could improve their interfaces in order 
 to make PigUnit simple.
 Other components based on PigUnit could be built later:
   - standalone MiniCluster
   - notion of workspaces for each test
   - standalone utility that reads test configuration and generates a test 
 report...
 It is a first prototype, open to suggestions and can definitely take 
 advantage of feedbacks.
 How to test, in pig_trunk:
 {code}
 Apply patch
 $pig_trunk ant compile-test
 $pig_trunk ant
 $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=99
 {code}
 (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
 future between 'unit' and 'integration')
 Many examples are in:
 {code}
 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
 {code}
 When used as a standalone, do not forget commons-lang-2.4.jar and the 
 HADOOP_CONF_DIR to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1481) PigServer throws exception if it cannot find hadoop-site.xml or core-site.xml

2010-07-02 Thread Sameer M (JIRA)
PigServer throws exception if it cannot find hadoop-site.xml or core-site.xml
-

 Key: PIG-1481
 URL: https://issues.apache.org/jira/browse/PIG-1481
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Sameer M


Hi

We've been using the Hadoop MiniCluster to do unit testing of our pig scripts 
in the following way.

MiniCluster minicluster = MiniCluster.buildCluster(2,2);
pigServer = new  PigServer(ExecType.MAPREDUCE, minicluster.getProperties());

This has been working fine for 0.6 and 0.7. 

However in the trunk (0.8) looks like there is change due to which an exception 
is thrown if hadoop-site.xml or core-site.xml is not found in the classpath.

org.apache.pig.backend.executionengine.ExecException: ERROR 4010: Cannot find 
hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml 
was found in the classpath).If you plan to use local mode, please put -x local 
option in command line
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:149)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:114)
at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
at org.apache.pig.PigServer.init(PigServer.java:215)
at org.apache.pig.PigServer.init(PigServer.java:204)
at org.apache.pig.PigServer.init(PigServer.java:200)


The problem seems to be 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine: 148
if( hadoop_site == null  core_site == null ) {
throw new ExecException(Cannot find hadoop configurations in 
classpath (neither hadoop-site.xml nor core-site.xml was found in the 
classpath). +
If you plan to use local mode, please put -x 
local option in command line, 
4010);
}

We would like to use the mapreduce mode but with the minicluster and have a lot 
of unit test with that setup.

Can this check be removed from this level ?

Thanks
Sameer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1389) Implement Pig counter to track number of rows for each input files

2010-07-02 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884834#action_12884834
 ] 

Richard Ding commented on PIG-1389:
---

We use Hadoop Path as a parser to parse the input/output locations for two use 
cases:

* Determine the scheme of the location (e.g., hdfs, hbase, file, har, ...), and
* Get the short file name of the location

Ashutosh is right that this approach doesn't work with location string as in 
PIG-1229:

{code}
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
{code}

A RuntimeException is thrown when trying to parse it.

The proposal is to use Java URI as parser instead for these use cases (Java URI 
throws a checked exception for invalid syntax). 

 Implement Pig counter to track number of rows for each input files 
 ---

 Key: PIG-1389
 URL: https://issues.apache.org/jira/browse/PIG-1389
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1389.patch, PIG-1389.patch, PIG-1389_1.patch


 A MR job generated by Pig not only can have multiple outputs (in the case of 
 multiquery) but also can have multiple inputs (in the case of join or 
 cogroup). In both cases, the existing Hadoop counters (e.g. 
 MAP_INPUT_RECORDS, REDUCE_OUTPUT_RECORDS) can not be used to count the number 
 of records in the given input or output.  PIG-1299 addressed the case of 
 multiple outputs.  We need to add new counters for jobs with multiple inputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884841#action_12884841
 ] 

Aniket Mokashi commented on PIG-928:


The fix needed some changes in queryparser to support namespace, I found this 
in test cases I added. 
Current EvalFuncSpec logic is convoluted, I replaced it with a cleaner one.
I have attached the updated patch with changes mentioned above.

I am not sure what needs to be done for jython.jar, my guess was to check-in 
that in /lib. Thoughts?

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884845#action_12884845
 ] 

Dmitriy V. Ryaboy commented on PIG-928:
---

Aniket, I already made the changes you need to pull down jython -- take a look 
at the patch I attached.

One more general note -- let's say jython instead of python (in the grammar, 
the keywords, everywhere), as there may be slight incompatibilities between the 
two and we want to be clear on what we are using.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFFinale.patch

Changes needed for script UDF.
TODO- jython.jar related changes

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-02 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884863#action_12884863
 ] 

Julien Le Dem commented on PIG-928:
---

Aniket, this is assuming the  ScriptEngine requires only one jar.
I would suggest instead having a method ScriptEngine.init(PigContext) that 
would be called after the ScriptEngine instance has been retrieved from the 
factory.
That would let the script engine add whatever is needed to the job.
{code}
if(scriptingLang != null) {
ScriptEngine se = ScriptEngine.getInstance(scriptingLang);

//pigContext.scriptJars.add(se.getStandardScriptJarPath());
se.init(pigContext);
se.registerFunctions(path, namespace, pigContext);
}
{code}

Have a good week end, Julien

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDFFinale.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.