[jira] Assigned: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swati Jain reassigned PIG-1512:
---

Assignee: Swati Jain

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swati Jain updated PIG-1512:


Attachment: printJoin.patch

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swati Jain updated PIG-1512:


Attachment: printJoin.patch

Fix tab character

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch, printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swati Jain updated PIG-1512:


Attachment: printJoin.patch

Attach the right file, final upload.

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-1512 started by Swati Jain.

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)
PlanPrinter does not print LOJoin operator in the new logical optimization 
framework


 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
 Fix For: 0.8.0
 Attachments: printJoin.patch

PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swati Jain updated PIG-1512:


Attachment: (was: printJoin.patch)

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swati Jain updated PIG-1512:


Attachment: (was: printJoin.patch)

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swati Jain updated PIG-1512:


Patch Info: [Patch Available]

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework

2010-07-22 Thread Swati Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swati Jain updated PIG-1512:


Status: Patch Available  (was: In Progress)

 PlanPrinter does not print LOJoin operator in the new logical optimization 
 framework
 

 Key: PIG-1512
 URL: https://issues.apache.org/jira/browse/PIG-1512
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Swati Jain
Assignee: Swati Jain
 Fix For: 0.8.0

 Attachments: printJoin.patch


 PlanPrinter does not print LOJoin relational operator. As such, the LOJoin 
 operator would not get printed when we do an explain.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1500) guava.jar should be removed from the lib folder

2010-07-22 Thread niraj rai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891244#action_12891244
 ] 

niraj rai commented on PIG-1500:


I  ran test with guava-r06.jar and all test passed. If everyone is fine, we can 
move to r06


 guava.jar should be removed from the lib folder
 ---

 Key: PIG-1500
 URL: https://issues.apache.org/jira/browse/PIG-1500
 Project: Pig
  Issue Type: Bug
  Components: build
Reporter: Giridharan Kesavan
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: removeGuavaJar.patch


 guava jar is available in the maven repository but still its is checked into 
 the pig trunk's lib folder.
 I ve checked the availability of guava jar in the maven repository.
 http://mvnrepository.com/artifact/com.google.guava/guava

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1505) support jars and scripts in dfs

2010-07-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891258#action_12891258
 ] 

Alan Gates commented on PIG-1505:
-

I ran core and contrib tests manually and they both pass.  Richard will be 
reviewing the patch.

 support jars and scripts in dfs
 ---

 Key: PIG-1505
 URL: https://issues.apache.org/jira/browse/PIG-1505
 Project: Pig
  Issue Type: Improvement
Reporter: Andrew Hitchcock
Assignee: Andrew Hitchcock
 Attachments: pig-jars-and-scripts-from-dfs-3.patch, 
 pig-jars-and-scripts-from-dfs-trunk-1.patch, 
 pig-jars-and-scripts-from-dfs-trunk-2.patch, 
 pig-jars-and-scripts-from-dfs-trunk.patch


 Pig can't operate on files stored in Amazon S3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1511) Pig removes packages from its own jar when building the JAR to ship to Hadoop

2010-07-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891261#action_12891261
 ] 

Alan Gates commented on PIG-1511:
-

We don't want to do this by default.  In a couple of instances keeping the size 
of this jar down is more important.  One, when the number of tasks being used 
is very large, since that jar is being copied once to each task, and two when 
the job itself is quite small and the setup costs become a concern.

 Pig removes packages from its own jar when building the JAR to ship to Hadoop
 -

 Key: PIG-1511
 URL: https://issues.apache.org/jira/browse/PIG-1511
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Eric Tschetter
 Attachments: pig-1511.diff


 Pig generates a new jar file to ship over to Hadoop.  Pig has a couple of 
 packages whitelisted that it includes from its own jar.  Pig throws away 
 everything else.
 I package all of my dependencies into a single jar file.  Pig is included in 
 this jar file.  I do it this way because my code needs to run reliably and 
 reproducibly in production.  Pig throws away all of my dependencies.
 I don't know what the performance gain is of shaving ~5MB off of a jar that 
 is pushed to a job tracker once and then used to run over 100s of GB of data. 
  The overhead is minimal on my cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1513) Skewed join doesn't handle empty input directory

2010-07-22 Thread Richard Ding (JIRA)
Skewed join doesn't handle empty input directory


 Key: PIG-1513
 URL: https://issues.apache.org/jira/browse/PIG-1513
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0



The following script

{code}
A = load 'input';
B = load 'emptydir';
C = join B by $0, A by $0 using 'skewed';
dump C
{code}

fails with ERROR: java.lang.RuntimeException: Empty samples file';

In this case, the sample job has 0 maps.  Pig doesn't expect this and fails . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1513) Skewed join doesn't handle empty input directory

2010-07-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891281#action_12891281
 ] 

Olga Natkovich commented on PIG-1513:
-

Are we sure that the problem only occurs with skewed join? I would like to make 
this JIRA more generic and to make sure that pig returns empty results given 
empty input and short circuits the processing as early as possible

 Skewed join doesn't handle empty input directory
 

 Key: PIG-1513
 URL: https://issues.apache.org/jira/browse/PIG-1513
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0


 The following script
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join B by $0, A by $0 using 'skewed';
 dump C
 {code}
 fails with ERROR: java.lang.RuntimeException: Empty samples file';
 In this case, the sample job has 0 maps.  Pig doesn't expect this and fails . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1505) support jars and scripts in dfs

2010-07-22 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891347#action_12891347
 ] 

Richard Ding commented on PIG-1505:
---

Thank you for the update. A few more comments:

* According to Pig Latin manual, user can also register additional files (to 
use with user's Pig script) via the command line using the 
-Dpig.additional.jars option (in addition to the REGISTER statement inside a 
Pig script). I suggest you call FileLocalizer.fetchFile from the shared method 
PigServer.registerJar so both cases will be covered.

* Can you change the method signature to

{code}
public static FetchFileRet fetchFile(Properties properties, String filePath) 
throws IOException
{code}

The reason is that we have deprecated all other public methods on FileLocalizer 
which has DataStorage as a parameter (so we can deprecate DataStorage in the 
future). I think this is safe since the condition in the method 

{code}
((fileUri.getScheme() == null)  (dfs == null))
{code}

is not used in the patch.

* You need to add a unit test in the patch (by first copying a Pig script to 
the mini-cluster).

* Finally, since this is a new feature, can you add a release note (On jira, 
there is a Release Note field) so that it will be incorporated in the next Pig 
release notes.



 support jars and scripts in dfs
 ---

 Key: PIG-1505
 URL: https://issues.apache.org/jira/browse/PIG-1505
 Project: Pig
  Issue Type: Improvement
Reporter: Andrew Hitchcock
Assignee: Andrew Hitchcock
 Attachments: pig-jars-and-scripts-from-dfs-3.patch, 
 pig-jars-and-scripts-from-dfs-trunk-1.patch, 
 pig-jars-and-scripts-from-dfs-trunk-2.patch, 
 pig-jars-and-scripts-from-dfs-trunk.patch


 Pig can't operate on files stored in Amazon S3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891357#action_12891357
 ] 

Aniket Mokashi commented on PIG-928:


bq. I am still not convinced about the changes required in POUserFunc. That 
logic should really be a part of pythonToPig(pyObject). If python UDF is 
returning byte[], it should be turned into DataByteArray before it gets back 
into Pig's pipeline. And if we do that conversion in pythonToPig() (which is a 
right place to do it) we will need no changes in POUserFunc.
I agree that it is better to move computation on JythonFunction side 
(JythonUtils) for type checking and should provide more type safety to avoid 
user defined types complexity. But I would still go for changes in POUserFunc 
for result.result for the case defined in above example (removing byte[] 
scenario).
bq. Instead of instanceof, doing class equality test will be a wee-bit faster. 
Like instead of (pyObject instanceof PyDictionary) do pyobject.getClass() == 
PyDictionary.class. Obviously, it will work when you know exact target class 
and not for the derived ones.
Jython code has derived classes for each of the basic Jython types, though they 
aren't used for most of the types as of now, they may start returning these 
derived objects (PyTupleDerived) in their future implementation, in which case 
we might break our code. Also, PyLongDerived are already used inside the code. 
__tojava__ function just returns the proxy java object until we ask for a 
specific type of object. I think its better to use instanceof instead of class 
equality here.
bq. For register command, we need to test not only for functionality but for 
regressions as well. Look at TestGrunt.java in test package to get an idea how 
to write test for it.
Code path for .jar registration is identical to old code, except that it doesnt 
use any engine or namespace.
bq. Also what will happen if user returned a nil python object (null equivalent 
of Java) from UDF. It looks to me that will result in NPE. Can you add a test 
for that and similar test case from pigToPython()
A java null object will be turned into PyNone object but __tojava__ function 
will always returns the special object Py.NoConversion  if this PyObject can 
not be converted to the desired Java class.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1513) Pig doesn't handle empty input directory

2010-07-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1513:
--

Summary: Pig doesn't handle empty input directory  (was: Skewed join 
doesn't handle empty input directory)
Description: 

The following script

{code}
A = load 'input';
B = load 'emptydir';
C = join B by $0, A by $0 using 'skewed';
store C into 'output';
{code}

fails with ERROR: java.lang.RuntimeException: Empty samples file';

In this case, the sample job has 0 maps.  Pig doesn't expect this and fails . 

For merge join the script

The merge join script

{code}
A = load 'input';
B = load 'emptydir';
C = join A by $0, B by $0 using 'merge';
store C into 'output';
{code}

the sample job again has 0 maps and the script  fails with  ERROR 2176: Error 
processing right input during merge join.

But if we change the join order: 

{code}
A = load 'input';
B = load 'emptydir';
C = join B by $0, A by $0 using 'merge';
store C into 'output';
{code}

The second job (merge) now has 0 maps and 0 reduces. And it generates an empty 
'output' directory.

Order by on empty directory works fine and generates empty part files.

  was:

The following script

{code}
A = load 'input';
B = load 'emptydir';
C = join B by $0, A by $0 using 'skewed';
dump C
{code}

fails with ERROR: java.lang.RuntimeException: Empty samples file';

In this case, the sample job has 0 maps.  Pig doesn't expect this and fails . 


 Pig doesn't handle empty input directory
 

 Key: PIG-1513
 URL: https://issues.apache.org/jira/browse/PIG-1513
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0


 The following script
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join B by $0, A by $0 using 'skewed';
 store C into 'output';
 {code}
 fails with ERROR: java.lang.RuntimeException: Empty samples file';
 In this case, the sample job has 0 maps.  Pig doesn't expect this and fails . 
 For merge join the script
 The merge join script
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join A by $0, B by $0 using 'merge';
 store C into 'output';
 {code}
 the sample job again has 0 maps and the script  fails with  ERROR 2176: 
 Error processing right input during merge join.
 But if we change the join order: 
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join B by $0, A by $0 using 'merge';
 store C into 'output';
 {code}
 The second job (merge) now has 0 maps and 0 reduces. And it generates an 
 empty 'output' directory.
 Order by on empty directory works fine and generates empty part files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1513) Pig doesn't handle empty input directory

2010-07-22 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891366#action_12891366
 ] 

Richard Ding commented on PIG-1513:
---

Changed the JIRA title to deal with general problem of empty input directory 
handling.

 Pig doesn't handle empty input directory
 

 Key: PIG-1513
 URL: https://issues.apache.org/jira/browse/PIG-1513
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0


 The following script
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join B by $0, A by $0 using 'skewed';
 store C into 'output';
 {code}
 fails with ERROR: java.lang.RuntimeException: Empty samples file';
 In this case, the sample job has 0 maps.  Pig doesn't expect this and fails . 
 For merge join the script
 The merge join script
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join A by $0, B by $0 using 'merge';
 store C into 'output';
 {code}
 the sample job again has 0 maps and the script  fails with  ERROR 2176: 
 Error processing right input during merge join.
 But if we change the join order: 
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join B by $0, A by $0 using 'merge';
 store C into 'output';
 {code}
 The second job (merge) now has 0 maps and 0 reduces. And it generates an 
 empty 'output' directory.
 Order by on empty directory works fine and generates empty part files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-07-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1178:


Attachment: PIG-1178-4.patch

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, 
 pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, 
 pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-07-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1178:


Status: Open  (was: Patch Available)

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, 
 pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, 
 pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-07-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1178:


Status: Patch Available  (was: Open)

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, 
 pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, 
 pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1505) support jars and scripts in dfs

2010-07-22 Thread Andrew Hitchcock (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891436#action_12891436
 ] 

Andrew Hitchcock commented on PIG-1505:
---

Thanks Richard. Is there a unit test you recommend that I can model mine after? 
Something that uses the mini-cluster.

 support jars and scripts in dfs
 ---

 Key: PIG-1505
 URL: https://issues.apache.org/jira/browse/PIG-1505
 Project: Pig
  Issue Type: Improvement
Reporter: Andrew Hitchcock
Assignee: Andrew Hitchcock
 Attachments: pig-jars-and-scripts-from-dfs-3.patch, 
 pig-jars-and-scripts-from-dfs-trunk-1.patch, 
 pig-jars-and-scripts-from-dfs-trunk-2.patch, 
 pig-jars-and-scripts-from-dfs-trunk.patch


 Pig can't operate on files stored in Amazon S3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFLatest2.patch

Added test for map-udf, null-inputoutput and grunt
Made required changes as per suggestions.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-07-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12891438#action_12891438
 ] 

Daniel Dai commented on PIG-1178:
-

Attach PIG-1178-4.patch, include change of the following area:
1. Add all the relational operators
2. Add foreach nested plans
3. Add field schema to expression operators
4. Remove UidStamp, instead, uid will be generated and cached first time we get 
fieldschema
5. Fix column pruner and all other new logical plan test cases
6. Add TypeCastInserter

Still polishing and refactory the code.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, 
 pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, 
 pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Open  (was: Patch Available)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-22 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Patch Available  (was: Open)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterPythonUDFLatest.patch, RegisterPythonUDFLatest2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1514) Migrate logical optimization rule: OpLimitOptimizer

2010-07-22 Thread Daniel Dai (JIRA)
Migrate logical optimization rule: OpLimitOptimizer
---

 Key: PIG-1514
 URL: https://issues.apache.org/jira/browse/PIG-1514
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1515) Migrate logical optimization rule: PushDownForeachFlatten

2010-07-22 Thread Daniel Dai (JIRA)
Migrate logical optimization rule: PushDownForeachFlatten
-

 Key: PIG-1515
 URL: https://issues.apache.org/jira/browse/PIG-1515
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Xuefu Zhang
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.