[jira] Commented: (PIG-1511) Pig removes packages from its own jar when building the JAR to ship to Hadoop

2010-07-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891648#action_12891648
 ] 

Alan Gates commented on PIG-1511:
-

The issue there is that blacklists are hard to maintain.  Every time some adds 
a package to Pig they have to remember to add to that blacklist.  

If you register your jar Pig will wrap it up and take it along.  Does this not 
work for your use case?

> Pig removes packages from its own jar when building the JAR to ship to Hadoop
> -
>
> Key: PIG-1511
> URL: https://issues.apache.org/jira/browse/PIG-1511
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Eric Tschetter
> Attachments: pig-1511.diff
>
>
> Pig generates a new jar file to ship over to Hadoop.  Pig has a couple of 
> packages whitelisted that it includes from its own jar.  Pig throws away 
> everything else.
> I package all of my dependencies into a single jar file.  Pig is included in 
> this jar file.  I do it this way because my code needs to run reliably and 
> reproducibly in production.  Pig throws away all of my dependencies.
> I don't know what the performance gain is of shaving ~5MB off of a jar that 
> is pushed to a job tracker once and then used to run over 100s of GB of data. 
>  The overhead is minimal on my cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-656) Use of eval or any other keyword in the package hierarchy of a UDF causes parse exception

2010-07-23 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-656:
---

Attachment: pigusergroup656.patch

> Use of eval or any other keyword in the package hierarchy of a UDF causes 
> parse exception
> -
>
> Key: PIG-656
> URL: https://issues.apache.org/jira/browse/PIG-656
> Project: Pig
>  Issue Type: Bug
>  Components: documentation, grunt
>Affects Versions: 0.3.0
>Reporter: Viraj Bhat
>Assignee: Milind Bhandarkar
> Fix For: 0.3.0
>
> Attachments: mywordcount.txt, pigusergroup656.patch, reserved.patch, 
> TOKENIZE.jar
>
>
> Consider a Pig script which does something similar to a word count. It uses 
> the built-in TOKENIZE function, but packages it inside a class hierarchy such 
> as "mypackage.eval"
> {code}
> register TOKENIZE.jar
> my_src  = LOAD '/user/viraj/mywordcount.txt' USING PigStorage('\t')  AS 
> (mlist: chararray);
> modules = FOREACH my_src GENERATE FLATTEN(mypackage.eval.TOKENIZE(mlist));
> describe modules;
> grouped = GROUP modules BY $0;
> describe grouped;
> counts  = FOREACH grouped GENERATE COUNT(modules), group;
> ordered = ORDER counts BY $0;
> dump ordered;
> {code}
> The parser complains:
> ===
> 2009-02-05 01:17:29,231 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Invalid alias: mypackage in {mlist: chararray}
> ===
> I looked at the following source code at 
> (src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt) and it seems 
> that : EVAL is a keyword in Pig. Here are some clarifications:
> 1) Is there documentation on what the EVAL keyword actually is?
> 2) Is EVAL keyword actually implemented?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1505) support jars and scripts in dfs

2010-07-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891685#action_12891685
 ] 

Richard Ding commented on PIG-1505:
---


You can take a look at the test cases in TestPigRunner where local Pig scripts 
are passed to the PigRunner.run method. 

You can first copy a local Pig script to the mini-cluster using

{code}
Util.copyFromLocalToCluster(cluster, , 
);
{code}

and then invoke run method with argument

{code}
String[] args = { "-f", "hdfs://" };
PigRunner.run(args, null);
{code}

> support jars and scripts in dfs
> ---
>
> Key: PIG-1505
> URL: https://issues.apache.org/jira/browse/PIG-1505
> Project: Pig
>  Issue Type: Improvement
>Reporter: Andrew Hitchcock
>Assignee: Andrew Hitchcock
> Attachments: pig-jars-and-scripts-from-dfs-3.patch, 
> pig-jars-and-scripts-from-dfs-trunk-1.patch, 
> pig-jars-and-scripts-from-dfs-trunk-2.patch, 
> pig-jars-and-scripts-from-dfs-trunk.patch
>
>
> Pig can't operate on files stored in Amazon S3.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1435) make sure dependent jobs fail when a jon in multiquery fails

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1435:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to trunk. Thanks Niraj.

> make sure dependent jobs fail when a jon in multiquery fails
> 
>
> Key: PIG-1435
> URL: https://issues.apache.org/jira/browse/PIG-1435
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: depJobs.patch, depJobsFailure.patch, 
> depJobsFailure2.patch, depJobsFailure3.patch
>
>
> Currently if one of the MQ jobs fails, Pig tries to run all remainin jobs. As 
> the result, if data was partially generated by the failed job, you might get 
> incorrect results from dependent jobs. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce

2010-07-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair reassigned PIG-1516:
--

Assignee: Thejas M Nair

> finalize in bag implementations causes pig to run out of memory in reduce 
> --
>
> Key: PIG-1516
> URL: https://issues.apache.org/jira/browse/PIG-1516
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> *Problem:*
> pig bag implementations that are subclasses of DefaultAbstractBag, have 
> finalize methods implemented. As a result, the garbage collector moves them 
> to a finalization queue, and the memory used is freed only after the 
> finalization happens on it.
> If the bags are not finalized fast enough, a lot of memory is consumed by the 
> finalization queue, and pig runs out of memory. This can happen if large 
> number of small bags are being created.
> *Solution:*
> The finalize function exists for the purpose of deleting the spill files that 
> are created when the bag is too large. But if the bags are small enough, no 
> spill files are created, and there is no use of the finalize function.
>  A new class that holds a list of files will be introduced (FileList). This 
> class will have a finalize method that deletes the files. The bags will no 
> longer have finalize methods, and the bags will use FileList instead of 
> ArrayList.
> *Possible workaround for earlier releases:*
> Since the fix is going into 0.8, here is a workaround -
> Disabling the combiner will reduce the number of bags getting created, as 
> there will not be the stage of combining intermediate merge results. But I 
> would recommend disabling it only if you have this problem as it is likely to 
> slow down the query .
> To disable combiner, set the property: -Dpig.exec.nocombiner=true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce

2010-07-23 Thread Thejas M Nair (JIRA)
finalize in bag implementations causes pig to run out of memory in reduce 
--

 Key: PIG-1516
 URL: https://issues.apache.org/jira/browse/PIG-1516
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
 Fix For: 0.8.0


*Problem:*
pig bag implementations that are subclasses of DefaultAbstractBag, have 
finalize methods implemented. As a result, the garbage collector moves them to 
a finalization queue, and the memory used is freed only after the finalization 
happens on it.
If the bags are not finalized fast enough, a lot of memory is consumed by the 
finalization queue, and pig runs out of memory. This can happen if large number 
of small bags are being created.

*Solution:*
The finalize function exists for the purpose of deleting the spill files that 
are created when the bag is too large. But if the bags are small enough, no 
spill files are created, and there is no use of the finalize function.
 A new class that holds a list of files will be introduced (FileList). This 
class will have a finalize method that deletes the files. The bags will no 
longer have finalize methods, and the bags will use FileList instead of 
ArrayList.

*Possible workaround for earlier releases:*
Since the fix is going into 0.8, here is a workaround -
Disabling the combiner will reduce the number of bags getting created, as there 
will not be the stage of combining intermediate merge results. But I would 
recommend disabling it only if you have this problem as it is likely to slow 
down the query .
To disable combiner, set the property: -Dpig.exec.nocombiner=true


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Pig 0.8.0 branch plan

2010-07-23 Thread Olga Natkovich
Pig Developers,

 

I would like to propose that we branch for Pig 0.8.0 at the end of
August and plan for the release by the end of October. Please, let me
know if you see problem with either of the dates.

 

If you are planning to contribute any patches to Pig 0.8.0, please, make
sure that you have a JIRA open and linked to 0.8.0 release and also that
you will be able to get the code in before the branch is created. If you
have a JIRA assigned to you that is linked to Pig 0.8.0 and you don't
think you can get it in before the branch, please, unlink it from the
release.

 

Thanks,

 

Olga



[jira] Commented: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce

2010-07-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891784#action_12891784
 ] 

Thejas M Nair commented on PIG-1516:


Regarding the workaround - I would recommend disabling the combiner only if 
other steps such as increasing the heap size or increasing the number of 
reducers do not help.

> finalize in bag implementations causes pig to run out of memory in reduce 
> --
>
> Key: PIG-1516
> URL: https://issues.apache.org/jira/browse/PIG-1516
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> *Problem:*
> pig bag implementations that are subclasses of DefaultAbstractBag, have 
> finalize methods implemented. As a result, the garbage collector moves them 
> to a finalization queue, and the memory used is freed only after the 
> finalization happens on it.
> If the bags are not finalized fast enough, a lot of memory is consumed by the 
> finalization queue, and pig runs out of memory. This can happen if large 
> number of small bags are being created.
> *Solution:*
> The finalize function exists for the purpose of deleting the spill files that 
> are created when the bag is too large. But if the bags are small enough, no 
> spill files are created, and there is no use of the finalize function.
>  A new class that holds a list of files will be introduced (FileList). This 
> class will have a finalize method that deletes the files. The bags will no 
> longer have finalize methods, and the bags will use FileList instead of 
> ArrayList.
> *Possible workaround for earlier releases:*
> Since the fix is going into 0.8, here is a workaround -
> Disabling the combiner will reduce the number of bags getting created, as 
> there will not be the stage of combining intermediate merge results. But I 
> would recommend disabling it only if you have this problem as it is likely to 
> slow down the query .
> To disable combiner, set the property: -Dpig.exec.nocombiner=true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword

2010-07-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891788#action_12891788
 ] 

Olga Natkovich commented on PIG-1249:
-

Ashutosh,

First, the changes are not going to be in framework till Hadoop 22 and I don't 
think we want to wait that far as we are seeing quite a few problems on our 
cluster. Second, I think we want to take a direction with pig of setting things 
up for users. Of course, we don't have stats right now to do so accurately but 
I think this is a step in the right direction

> Safe-guards against misconfigured Pig scripts without PARALLEL keyword
> --
>
> Key: PIG-1249
> URL: https://issues.apache.org/jira/browse/PIG-1249
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Arun C Murthy
>Assignee: Jeff Zhang
>Priority: Critical
> Fix For: 0.8.0
>
> Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, 
> PIG_1249_3.patch
>
>
> It would be *very* useful for Pig to have safe-guards against naive scripts 
> which process a *lot* of data without the use of PARALLEL keyword.
> We've seen a fair number of instances where naive users process huge 
> data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword

2010-07-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891789#action_12891789
 ] 

Olga Natkovich commented on PIG-1249:
-

Jeff, sorry this patch did not get much attention in a while. Can I ask you to 
do the following:

(1) Regenrate the patch for the latest trunk and make sure that the tests are 
passing and we get no additional warnings
(2) Add a docs comment that describes in one place what are the exact 
heuristics, when they are applied and how they can be influenced. I will ask 
our doc writer to incorporate this information in Pig 0.8.0 documentation
(3) If it is not already done, can we log the value that will be used so that 
the user knows what is happenning

Thanks!

> Safe-guards against misconfigured Pig scripts without PARALLEL keyword
> --
>
> Key: PIG-1249
> URL: https://issues.apache.org/jira/browse/PIG-1249
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0
>Reporter: Arun C Murthy
>Assignee: Jeff Zhang
>Priority: Critical
> Fix For: 0.8.0
>
> Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, 
> PIG_1249_3.patch
>
>
> It would be *very* useful for Pig to have safe-guards against naive scripts 
> which process a *lot* of data without the use of PARALLEL keyword.
> We've seen a fair number of instances where naive users process huge 
> data-sets (>10TB) with badly mis-configured #reduces e.g. 1 reduce. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-259) allow store to overwrite existing directroy

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-259:
---

Fix Version/s: (was: 0.8.0)

Unlinking since there is no activity since early may. Jeff, please, feel free 
to link in if you still planning to work on it for 0.8 release

> allow store to overwrite existing directroy
> ---
>
> Key: PIG-259
> URL: https://issues.apache.org/jira/browse/PIG-259
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.8.0
>Reporter: Olga Natkovich
>Assignee: Jeff Zhang
> Attachments: Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, 
> Pig_259_4.patch
>
>
> we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-466) PERFORMANCE: dropping the columns as soon as possible

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-466.


Resolution: Fixed

This is already resolved as part of PIG-1178

> PERFORMANCE: dropping the columns as soon as possible
> -
>
> Key: PIG-466
> URL: https://issues.apache.org/jira/browse/PIG-466
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> Currently, each operator carries all the data until foreach is encountered. 
> This can cause significant performance degradation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-498) Pig does not error out while trying to use a input file to which the user does not have access permissions

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-498:
--

Assignee: niraj rai

I am guessing this issue might have gone away with Pig 0.7.0. Niraj, could you 
verify and if it is gone, please, close

> Pig does not error out while trying to use a input file to which the user 
> does not have access permissions
> --
>
> Key: PIG-498
> URL: https://issues.apache.org/jira/browse/PIG-498
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: niraj rai
> Fix For: 0.8.0
>
>
> Session illustrating the issue.
> {code}
> bash-3.00$ hadoop fs -ls /data/statistics.txt
> ls: org.apache.hadoop.fs.permission.AccessControlException: Permission 
> denied: user=, access=READ_EXECUTE, inode=""-
> bash-3.00$ pig -latest 
> 2008-10-16 23:31:25,134 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to HOD...
> ...
> 2008-10-16 23:34:45,810 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: local
> grunt> a = load '/data/statistics.txt';  
> grunt> dump a;
> 2008-10-16 23:39:05,624 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 100% complete
> 2008-10-16 23:39:05,624 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Success!
> grunt> 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-348) -j command line option doesn't work

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-348:


Assignee: Richard Ding  (was: Corinne Chandel)

> -j command line option doesn't work
> ---
>
> Key: PIG-348
> URL: https://issues.apache.org/jira/browse/PIG-348
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Amir Youssefi
>Assignee: Richard Ding
> Fix For: 0.8.0
>
>
> According to:
> $ pig --help 
> ...
> -j, -jar jarfile load jarfile
> ...
> yet 
> $pig -j my.jar
> doesn't work in place of:
> register my.jar 
> in Pig script. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-348) -j command line option doesn't work

2010-07-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891795#action_12891795
 ] 

Richard Ding commented on PIG-348:
--

I'll first remove the -j option from source code.

> -j command line option doesn't work
> ---
>
> Key: PIG-348
> URL: https://issues.apache.org/jira/browse/PIG-348
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Amir Youssefi
>Assignee: Corinne Chandel
> Fix For: 0.8.0
>
>
> According to:
> $ pig --help 
> ...
> -j, -jar jarfile load jarfile
> ...
> yet 
> $pig -j my.jar
> doesn't work in place of:
> register my.jar 
> in Pig script. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1453) [zebra] Intermittent failure for TestOrderPreserveUnionHDFS

2010-07-23 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1453:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Committed to the trunk.

> [zebra] Intermittent failure for TestOrderPreserveUnionHDFS
> ---
>
> Key: PIG-1453
> URL: https://issues.apache.org/jira/browse/PIG-1453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.8.0
>Reporter: Daniel Dai
>Assignee: Yan Zhou
> Fix For: 0.8.0
>
> Attachments: PIG-1453.patch, PIG-1453.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-602) Pass global configurations to UDF

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-602.


Resolution: Fixed

> Pass global configurations to UDF
> -
>
> Key: PIG-602
> URL: https://issues.apache.org/jira/browse/PIG-602
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Yiping Han
> Fix For: 0.8.0
>
>
> We are seeking an easy way to pass a large number of global configurations to 
> UDFs.
> Since our application contains many pig jobs, and has a large number of 
> configurations. Passing configurations through command line is not an ideal 
> way (i.e. modifying single parameter needs to change multiple command lines). 
> And to put everything into the hadoop conf is not an ideal way either.
> We would like to see if Pig can provide such a facility that allows us to 
> pass a configuration file in some format(XML?) and then make it available 
> through out all the UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-602) Pass global configurations to UDF

2010-07-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891800#action_12891800
 ] 

Olga Natkovich commented on PIG-602:


This work is already done. The user can propagate the properties via 
"-propertyfile  from the command line and the retrieve the properties 
via call to UDFContext.getJobConf. Just need to document this for Pig 0.8.0 
release

> Pass global configurations to UDF
> -
>
> Key: PIG-602
> URL: https://issues.apache.org/jira/browse/PIG-602
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Yiping Han
> Fix For: 0.8.0
>
>
> We are seeking an easy way to pass a large number of global configurations to 
> UDFs.
> Since our application contains many pig jobs, and has a large number of 
> configurations. Passing configurations through command line is not an ideal 
> way (i.e. modifying single parameter needs to change multiple command lines). 
> And to put everything into the hadoop conf is not an ideal way either.
> We would like to see if Pig can provide such a facility that allows us to 
> pass a configuration file in some format(XML?) and then make it available 
> through out all the UDFs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-348) -j command line option doesn't work

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-348:
-

Attachment: PIG-348.path

> -j command line option doesn't work
> ---
>
> Key: PIG-348
> URL: https://issues.apache.org/jira/browse/PIG-348
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Amir Youssefi
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-348.path
>
>
> According to:
> $ pig --help 
> ...
> -j, -jar jarfile load jarfile
> ...
> yet 
> $pig -j my.jar
> doesn't work in place of:
> register my.jar 
> in Pig script. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-348) -j command line option doesn't work

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-348:
-

Status: Patch Available  (was: Open)

> -j command line option doesn't work
> ---
>
> Key: PIG-348
> URL: https://issues.apache.org/jira/browse/PIG-348
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Amir Youssefi
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-348.path
>
>
> According to:
> $ pig --help 
> ...
> -j, -jar jarfile load jarfile
> ...
> yet 
> $pig -j my.jar
> doesn't work in place of:
> register my.jar 
> in Pig script. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1379) Jars registered from command line should override the ones present in the script

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1379:
--

Status: Open  (was: Patch Available)

> Jars registered from command line should override the ones present in the 
> script 
> -
>
> Key: PIG-1379
> URL: https://issues.apache.org/jira/browse/PIG-1379
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.8.0
>
>
> Jars that are registered from the command line when executing the pig script 
> should override the ones that are specified via 'register' in the pig script 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1379) Jars registered from command line should override the ones present in the script

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1379:
--

Attachment: (was: PIG-1379.patch)

> Jars registered from command line should override the ones present in the 
> script 
> -
>
> Key: PIG-1379
> URL: https://issues.apache.org/jira/browse/PIG-1379
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.8.0
>
>
> Jars that are registered from the command line when executing the pig script 
> should override the ones that are specified via 'register' in the pig script 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1379) Jars registered from command line should override the ones present in the script

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1379:
--

Attachment: (was: PIG-1379.patch)

> Jars registered from command line should override the ones present in the 
> script 
> -
>
> Key: PIG-1379
> URL: https://issues.apache.org/jira/browse/PIG-1379
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.8.0
>
>
> Jars that are registered from the command line when executing the pig script 
> should override the ones that are specified via 'register' in the pig script 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1379) Jars registered from command line should override the ones present in the script

2010-07-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891815#action_12891815
 ] 

Richard Ding commented on PIG-1379:
---

Alan, I got your point. I now think that we should reconsider this feature 
request. It isn't clear to me why this is useful. Users can use parameter 
substitution if they don't want to change the Pig scripts. 

I moved the posted patch to PIG-348. 

> Jars registered from command line should override the ones present in the 
> script 
> -
>
> Key: PIG-1379
> URL: https://issues.apache.org/jira/browse/PIG-1379
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.8.0
>
>
> Jars that are registered from the command line when executing the pig script 
> should override the ones that are specified via 'register' in the pig script 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1379) Jars registered from command line should override the ones present in the script

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1379.
-

Resolution: Won't Fix

This is a non-backward compatible fix and it is not clear why we need to make 
it. Parameter substitution can be used to drive execution from command line

> Jars registered from command line should override the ones present in the 
> script 
> -
>
> Key: PIG-1379
> URL: https://issues.apache.org/jira/browse/PIG-1379
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.8.0
>
>
> Jars that are registered from the command line when executing the pig script 
> should override the ones that are specified via 'register' in the pig script 
> itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-348) -j command line option doesn't work

2010-07-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891820#action_12891820
 ] 

Olga Natkovich commented on PIG-348:


+1, changes look good

> -j command line option doesn't work
> ---
>
> Key: PIG-348
> URL: https://issues.apache.org/jira/browse/PIG-348
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Amir Youssefi
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-348.path
>
>
> According to:
> $ pig --help 
> ...
> -j, -jar jarfile load jarfile
> ...
> yet 
> $pig -j my.jar
> doesn't work in place of:
> register my.jar 
> in Pig script. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-621) Casts swallow exceptions when there are issues with conversion of bytes to Pig types

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-621:
---

Fix Version/s: 0.9.0
   (was: 0.8.0)

0.9 is all about improved error handling

> Casts swallow exceptions when there are issues with conversion of bytes to 
> Pig types
> 
>
> Key: PIG-621
> URL: https://issues.apache.org/jira/browse/PIG-621
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
> Fix For: 0.9.0
>
>
> In the current implementation of casts, exceptions thrown while converting 
> bytes to Pig types are swallowed. Pig needs to either return NULL or rethrow 
> the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-729) Use of default parallelism

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-729.


Resolution: Duplicate

We are going with the approach outlined in PIG-1249.

> Use of default parallelism
> --
>
> Key: PIG-729
> URL: https://issues.apache.org/jira/browse/PIG-729
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
> Environment: Hadoop 0.20
>Reporter: Santhosh Srinivasan
> Fix For: 0.8.0
>
>
> Currently, if the user does not specify the number of reduce slots using the 
> parallel keyword, Pig lets Hadoop decide on the default number of reducers. 
> This model worked well with dynamically allocated clusters using HOD and for 
> static clusters where the default number of reduce slots was explicitly set. 
> With Hadoop 0.20, a single static cluster will be shared amongst a number of 
> queues. As a result, a common scenario is to end up with default number of 
> reducers set to one (1).
> When users migrate to Hadoop 0.20, they might see a dramatic change in the 
> performance of their queries if they had not used the parallel keyword to 
> specify the number of reducers. In order to mitigate such circumstances, Pig 
> can support one of the following:
> 1. Specify a default parallelism for the entire script.
> This option will allow users to use the same parallelism for all operators 
> that do not have the explicit parallel keyword. This will ensure that the 
> scripts utilize more reducers than the default of one reducer. On the down 
> side, due to data transformations, usually operations that are performed 
> towards the end of the script will need smaller number of reducers compared 
> to the operators that appear at the beginning of the script.
> 2. Display a warning message for each reduce side operator that does have the 
> use of the explicit parallel keyword. Proceed with the execution.
> 3. Display an error message indicating the operator that does not have the 
> explicit use of the parallel keyword. Stop the execution.
> Other suggestions/thoughts/solutions are welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-348) -j command line option doesn't work

2010-07-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891827#action_12891827
 ] 

Richard Ding commented on PIG-348:
--


test-patch results:

{code}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
{code}

> -j command line option doesn't work
> ---
>
> Key: PIG-348
> URL: https://issues.apache.org/jira/browse/PIG-348
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Amir Youssefi
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-348.path
>
>
> According to:
> $ pig --help 
> ...
> -j, -jar jarfile load jarfile
> ...
> yet 
> $pig -j my.jar
> doesn't work in place of:
> register my.jar 
> in Pig script. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-787) Allow UDFs and their dependencies to be distributed via Hadoop's distributed cache

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-787.


Resolution: Won't Fix

Does not look like there is reason to do this

> Allow UDFs and their dependencies to be distributed via Hadoop's distributed 
> cache
> --
>
> Key: PIG-787
> URL: https://issues.apache.org/jira/browse/PIG-787
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-873) Optimizer should allow search for global patterns

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-873:
--

Assignee: Daniel Dai

Daniel, please review with Santhosh if additional work is required. If not, 
please, close. If there is more work, lets discuss if we need to do this in Pig 
0.8.0. Thanks

> Optimizer should allow search for global patterns
> -
>
> Key: PIG-873
> URL: https://issues.apache.org/jira/browse/PIG-873
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Santhosh Srinivasan
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> Currently, the optimizer works on the following mechanism:
> 1. Specify the pattern to be searched
> 2. For each occurrence of the pattern, check and then apply a transformation
> With this approach, the search for a pattern is localized. An example will 
> illustrate the problem.
> If the pattern to be searched for is foreach (with flatten) connected to any 
> operator and if the graph has more than one foreach (with flatten) connected 
> to an operator (cross, join, union, etc), then each instance of foreach 
> connected to the operator is returned as a match. While this is fine for a 
> localized view (per match), at a global view the pattern to be searched for 
> is any number of foreach connected to an operator.
> The implication of not having a globalized view is more rules. There will be 
> one rule for one foreach connected to an opeator, one rule for two foreachs 
> connected to an operators, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-930) merge join should handle compressed bz2 sorted files

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-930:
---

Fix Version/s: (was: 0.8.0)

Unlinking from the release. We have not really seen user asks for this

> merge join should handle compressed bz2 sorted files
> 
>
> Key: PIG-930
> URL: https://issues.apache.org/jira/browse/PIG-930
> Project: Pig
>  Issue Type: Bug
>Reporter: Pradeep Kamath
>
> There are two issues - POLoad which is used to read the right side input does 
> not handle bz2 files right now. This needs to be fixed.
> Further inn the index map job we bindTo(startOfBlockOffSet) (this will 
> internally discard first tuple if offset > 0). Then we do the following:
> {noformat}
> While(tuple survives pipeline) {
>   Pos =  getPosition()
>   getNext() 
>   run the tuple  through pipeline in the right side which could have filter
> }
> Emit(key, pos, filename).
> {noformat}
>  
> Then in the map job which does the join, we bindTo(pos > 0 ? pos  1 : pos) 
> (we do pos -1 because bindTo will discard first tuple for pos> 0). Then we do 
> getNext()
> Now in bz2 compressed files, getPosition() returns a position which is not 
> really accurate. The problem is it could be a position in the middle of a 
> compressed bz2 block. Then when we use that position to bindTo() in the final 
> map job, the code would first hunt for a bz2 block header thus skipping the 
> whole current bz2 block. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-932) Required fields projection in Loader: nested fields in bag/tuple, map key lookup more than two levels

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-932.


Resolution: Duplicate

This is duplicate of https://issues.apache.org/jira/browse/PIG-1324

> Required fields projection in Loader: nested fields in bag/tuple, map key 
> lookup more than two levels
> -
>
> Key: PIG-932
> URL: https://issues.apache.org/jira/browse/PIG-932
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.8.0
>
>
> To leverage the performance features provided by Zebra, Pig should be able to 
> figure out which input fields are actually used in Pig script, and prune 
> unnecessary inputs. This feature is being implementing in 
> [PIG-922|https://issues.apache.org/jira/browse/PIG-922]. However, there are 
> two limitations currently:
> 1. Pruning nested fields only apply to map. We do not prune sub-field inside 
> a bag or tuple
> 2. For map, currently we only go one level deep. Eg, if in Pig script, user 
> uses a#'key0'#'key1', a#'key0' will be asked
> These two limitations are in line with current limitation of Zebra loader. 
> Once Zebra loader can handle this, we need to work to lift these limitations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-947) Parsing Bags by PigStorage is not handled correctly if whitespace before start of tuple.

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-947:
---

Fix Version/s: (was: 0.8.0)

I don't think anybody is signed up for this issue. Please, relink to the 
release if you are interested to work on it and assign to yourself.

> Parsing Bags by PigStorage is not handled correctly if whitespace before 
> start of tuple.
> 
>
> Key: PIG-947
> URL: https://issues.apache.org/jira/browse/PIG-947
> Project: Pig
>  Issue Type: Bug
>  Components: data
> Environment: Pig on Hadoop 18
>Reporter: Gandul Azul
>
> PigStorage parser for bags is not working correctly when a tuple in a bag is 
> proceeded by a space. For example, the following is parsed correctly:
> {(-5.243084,3.142401,0.000138,2.071200,0),(-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}
> while this is not: (Note the space before the second tuple)
> {(-5.243084,3.142401,0.000138,2.071200,0), 
> (-6.021349,0.992683,0.44,0.992683,0),(-10.426160,20.251774,0.000892,5.691086,0)}
> It seems that the parser when it encounters the space, treats the rest of the 
> line as a String. With a schema, this results in a typecast of string to 
> databag which results in exception. 
> |WARN builtin.PigStorage: Unable to interpret value [...@2c9b42e6 in field 
> being converted to type bag, caught ParseException  "  
> "" at |line 1, column 43.
> |Was expecting:
> |"(" ...
> |> field discarded
> Below is the parser debug output for the parsing of the above error sequence: 
> "2.071200,0), (" from above...
> ** FOUND A  MATCH (2.071200) **
>   Call:   AtomDatum
> Consumed token: <: "2.071200" at line 1 column 31>
>   Return: AtomDatum
> Return: Datum
>Matched the empty string as  token.
> Current character : , (44) at line 1 column 39
>No more string literal token matches are possible.
>Currently matched the first 1 characters as a "," token.
> ** FOUND A "," MATCH (,) **
> Consumed token: <"," at line 1 column 39>
> Call:   Datum
>Matched the empty string as  token.
> Current character : 0 (48) at line 1 column 40
>No string literal matches possible.
>Starting NFA to match one of : { , ,  
> }
> Current character : 0 (48) at line 1 column 40
>Currently matched the first 1 characters as a  token.
>Possible kinds of longer matches : { , , 
> , , 
>   }
> Current character : ) (41) at line 1 column 41
>Currently matched the first 1 characters as a  token.
>Putting back 1 characters into the input stream.
> ** FOUND A  MATCH (0) **
>   Call:   AtomDatum
> Consumed token: <: "0" at line 1 column 40>
>   Return: AtomDatum
> Return: Datum
>Matched the empty string as  token.
> Current character : ) (41) at line 1 column 41
>No more string literal token matches are possible.
>Currently matched the first 1 characters as a ")" token.
> ** FOUND A ")" MATCH ()) **
>   Return: Tuple
>   Consumed token: <")" at line 1 column 41>
>Matched the empty string as  token.
> Current character : , (44) at line 1 column 42
>No more string literal token matches are possible.
>Currently matched the first 1 characters as a "," token.
> ** FOUND A "," MATCH (,) **
>   Consumed token: <"," at line 1 column 42>
>Matched the empty string as  token.
> Current character :   (32) at line 1 column 43
>No string literal matches possible.
>Starting NFA to match one of : { , ,  
> }
> Current character :   (32) at line 1 column 43
>Currently matched the first 1 characters as a  token.
>Possible kinds of longer matches : { , , 
>  }
> Current character : ( (40) at line 1 column 44
>Currently matched the first 1 characters as a  token.
>Putting back 1 characters into the input stream.
> ** FOUND A  MATCH ( ) **
> Return: Bag
>   Return: Datum
> Return: Parse

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-959) Merge Join fails when there is a blocking operator before it in query.

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-959:
---

Fix Version/s: (was: 0.8.0)

We are not seeing any asks for this at this time

> Merge Join fails when there is a blocking operator before it in query.
> --
>
> Key: PIG-959
> URL: https://issues.apache.org/jira/browse/PIG-959
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: pig-959.patch
>
>
> If there is an order-by, distinct or any other blocking operator in query 
> followed by Merge Join, pig fails to compile it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1489) Pig MapReduceLauncher does not use jars in register statement

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1489:
--

Attachment: PIG-1489_1.patch

New patch adding the source code of the test jar.

>  Pig MapReduceLauncher does not use jars in register statement 
> ---
>
> Key: PIG-1489
> URL: https://issues.apache.org/jira/browse/PIG-1489
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1489.patch, PIG-1489.patch, PIG-1489_1.patch
>
>
> If my Pig StorFunc has its own OutputFormat class then Pig MapReducelauncher 
> will try to instantiate it before
> launching the mapreduce job and fail with ClassNotFoundException.
> This happens because Pig MapReduce launcher uses its own classloader and 
> ignores the classes in the jars in the
> register statement.
> The effect is that the jars not only have to be in "register " statement in 
> the script but also in the pig
> classpath with the -classpath tag. 
> This can be remedied by making the Pig MapReduceLauncher constructing a 
> classloader that includes the registered jars
> and using that to instantiate the OutputFormat class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1489) Pig MapReduceLauncher does not use jars in register statement

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1489:
--

Status: Open  (was: Patch Available)

>  Pig MapReduceLauncher does not use jars in register statement 
> ---
>
> Key: PIG-1489
> URL: https://issues.apache.org/jira/browse/PIG-1489
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1489.patch, PIG-1489.patch, PIG-1489_1.patch
>
>
> If my Pig StorFunc has its own OutputFormat class then Pig MapReducelauncher 
> will try to instantiate it before
> launching the mapreduce job and fail with ClassNotFoundException.
> This happens because Pig MapReduce launcher uses its own classloader and 
> ignores the classes in the jars in the
> register statement.
> The effect is that the jars not only have to be in "register " statement in 
> the script but also in the pig
> classpath with the -classpath tag. 
> This can be remedied by making the Pig MapReduceLauncher constructing a 
> classloader that includes the registered jars
> and using that to instantiate the OutputFormat class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1489) Pig MapReduceLauncher does not use jars in register statement

2010-07-23 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1489:
--

Status: Patch Available  (was: Open)

>  Pig MapReduceLauncher does not use jars in register statement 
> ---
>
> Key: PIG-1489
> URL: https://issues.apache.org/jira/browse/PIG-1489
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1489.patch, PIG-1489.patch, PIG-1489_1.patch
>
>
> If my Pig StorFunc has its own OutputFormat class then Pig MapReducelauncher 
> will try to instantiate it before
> launching the mapreduce job and fail with ClassNotFoundException.
> This happens because Pig MapReduce launcher uses its own classloader and 
> ignores the classes in the jars in the
> register statement.
> The effect is that the jars not only have to be in "register " statement in 
> the script but also in the pig
> classpath with the -classpath tag. 
> This can be remedied by making the Pig MapReduceLauncher constructing a 
> classloader that includes the registered jars
> and using that to instantiate the OutputFormat class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1489) Pig MapReduceLauncher does not use jars in register statement

2010-07-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891856#action_12891856
 ] 

Thejas M Nair commented on PIG-1489:


+1 
You can commit after verifying that tests & checks are passing.


>  Pig MapReduceLauncher does not use jars in register statement 
> ---
>
> Key: PIG-1489
> URL: https://issues.apache.org/jira/browse/PIG-1489
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1489.patch, PIG-1489.patch, PIG-1489_1.patch
>
>
> If my Pig StorFunc has its own OutputFormat class then Pig MapReducelauncher 
> will try to instantiate it before
> launching the mapreduce job and fail with ClassNotFoundException.
> This happens because Pig MapReduce launcher uses its own classloader and 
> ignores the classes in the jars in the
> register statement.
> The effect is that the jars not only have to be in "register " statement in 
> the script but also in the pig
> classpath with the -classpath tag. 
> This can be remedied by making the Pig MapReduceLauncher constructing a 
> classloader that includes the registered jars
> and using that to instantiate the OutputFormat class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-07-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891857#action_12891857
 ] 

Olga Natkovich commented on PIG-1150:
-

Dmitry, is patch ready to be committed or are you planning to submit a new one? 
Thanks

> VAR() Variance UDF
> --
>
> Key: PIG-1150
> URL: https://issues.apache.org/jira/browse/PIG-1150
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.5.0
> Environment: UDF, written in Pig 0.5 contrib/
>Reporter: Russell Jurney
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.8.0
>
> Attachments: var.patch
>
>
> I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
> variance in a distributed manner, based on the AVG() builtin.  It works by 
> calculating the count, sum and sum of squares, as described here: 
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
> Is this a worthwhile contribution?  Taking the square root of this value 
> using the contrib SQRT() function gives Standard Deviation, which is missing 
> from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-07-23 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891858#action_12891858
 ] 

Olga Natkovich commented on PIG-1205:
-

Jeff and Dmitry - are you still planning to finish this for Pig 0.8.0 release

> Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
> --
>
> Key: PIG-1205
> URL: https://issues.apache.org/jira/browse/PIG-1205
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.7.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.8.0
>
> Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, 
> PIG_1205_4.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1424) Error logs of streaming should not be placed in output location

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1424:


Fix Version/s: (was: 0.8.0)

This was linked to 0.8 to deal with PIG-1229 but we will take a different route 
there

> Error logs of streaming should not be placed in output location
> ---
>
> Key: PIG-1424
> URL: https://issues.apache.org/jira/browse/PIG-1424
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.7.0
>Reporter: Ashutosh Chauhan
>
> This becomes a problem when output location is anything other then a 
> filesystem. Output will be written to DB but where the logs generated by 
> streaming should go? Clearly, they cant be written into DB. This blocks 
> PIG-1229 which introduces writing to DB from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1489) Pig MapReduceLauncher does not use jars in register statement

2010-07-23 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891861#action_12891861
 ] 

Richard Ding commented on PIG-1489:
---


test-patch results:

{code}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 10 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
{code}

>  Pig MapReduceLauncher does not use jars in register statement 
> ---
>
> Key: PIG-1489
> URL: https://issues.apache.org/jira/browse/PIG-1489
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Richard Ding
> Fix For: 0.8.0
>
> Attachments: PIG-1489.patch, PIG-1489.patch, PIG-1489_1.patch
>
>
> If my Pig StorFunc has its own OutputFormat class then Pig MapReducelauncher 
> will try to instantiate it before
> launching the mapreduce job and fail with ClassNotFoundException.
> This happens because Pig MapReduce launcher uses its own classloader and 
> ignores the classes in the jars in the
> register statement.
> The effect is that the jars not only have to be in "register " statement in 
> the script but also in the pig
> classpath with the -classpath tag. 
> This can be remedied by making the Pig MapReduceLauncher constructing a 
> classloader that includes the registered jars
> and using that to instantiate the OutputFormat class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-07-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891863#action_12891863
 ] 

Dmitriy V. Ryaboy commented on PIG-1150:


Meh. Go ahead and commit. Don't put it into builtin, since it has math problems 
at scale. Ok for piggybank.

> VAR() Variance UDF
> --
>
> Key: PIG-1150
> URL: https://issues.apache.org/jira/browse/PIG-1150
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.5.0
> Environment: UDF, written in Pig 0.5 contrib/
>Reporter: Russell Jurney
>Assignee: Dmitriy V. Ryaboy
> Fix For: 0.8.0
>
> Attachments: var.patch
>
>
> I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
> variance in a distributed manner, based on the AVG() builtin.  It works by 
> calculating the count, sum and sum of squares, as described here: 
> http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
> Is this a worthwhile contribution?  Taking the square root of this value 
> using the contrib SQRT() function gives Standard Deviation, which is missing 
> from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1205) Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

2010-07-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891864#action_12891864
 ] 

Dmitriy V. Ryaboy commented on PIG-1205:


When is the cut-off date for that?

> Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc
> --
>
> Key: PIG-1205
> URL: https://issues.apache.org/jira/browse/PIG-1205
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.7.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.8.0
>
> Attachments: PIG_1205.patch, PIG_1205_2.patch, PIG_1205_3.patch, 
> PIG_1205_4.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1034) Pig does not support ORDER ... BY group alias

2010-07-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891877#action_12891877
 ] 

Thejas M Nair commented on PIG-1034:


I am reviewing this patch.

> Pig does not support ORDER ... BY group alias
> -
>
> Key: PIG-1034
> URL: https://issues.apache.org/jira/browse/PIG-1034
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: David Ciemiewicz
>Assignee: Jeff Zhang
> Fix For: 0.8.0
>
> Attachments: PIG_1034.patch
>
>
> GROUP ... ALL and GROUP ... BY produce an alias "group".
> Pig produces a syntax error if you attempt to ORDER ... BY group.
> This does seem like a perfectly reasonable thing to do.
> The workaround is to create an alias for group using an AS clause.  But I 
> think this workaround should be unnecessary.
> Here's sample code which elicits the syntax error:
> {code}
> A = load 'one.txt' using PigStorage as (one: int);
> B = group A all;
> C = foreach B generate
>   group,
>   COUNT(A) as count;
> D = order C by group parallel 1; -- group is one of the aliases in C, why 
> does this throw a syntax error?
> dump D;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder

2010-07-23 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1500:
---

Attachment: guava.jar.r06.patch

Attaching the patch with guava.jar r06 version as no one had problem in 
migrating to that version.


> guava.jar should be removed from the lib folder
> ---
>
> Key: PIG-1500
> URL: https://issues.apache.org/jira/browse/PIG-1500
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Reporter: Giridharan Kesavan
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: guava.jar.r06.patch, removeGuavaJar.patch
>
>
> guava jar is available in the maven repository but still its is checked into 
> the pig trunk's lib folder.
> I ve checked the availability of guava jar in the maven repository.
> http://mvnrepository.com/artifact/com.google.guava/guava

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder

2010-07-23 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1500:
---

Status: Open  (was: Patch Available)

> guava.jar should be removed from the lib folder
> ---
>
> Key: PIG-1500
> URL: https://issues.apache.org/jira/browse/PIG-1500
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Reporter: Giridharan Kesavan
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: guava.jar.r06.patch, removeGuavaJar.patch
>
>
> guava jar is available in the maven repository but still its is checked into 
> the pig trunk's lib folder.
> I ve checked the availability of guava jar in the maven repository.
> http://mvnrepository.com/artifact/com.google.guava/guava

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder

2010-07-23 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1500:
---

Status: Patch Available  (was: Open)

> guava.jar should be removed from the lib folder
> ---
>
> Key: PIG-1500
> URL: https://issues.apache.org/jira/browse/PIG-1500
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Reporter: Giridharan Kesavan
>Assignee: niraj rai
> Fix For: 0.8.0
>
> Attachments: guava.jar.r06.patch, removeGuavaJar.patch
>
>
> guava jar is available in the maven repository but still its is checked into 
> the pig trunk's lib folder.
> I ve checked the availability of guava jar in the maven repository.
> http://mvnrepository.com/artifact/com.google.guava/guava

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.