[jira] Updated: (PIG-979) Acummulator Interface for UDFs

2009-11-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-979:
---

Fix Version/s: 0.6.0
Affects Version/s: 0.4.0
   Status: Open  (was: Patch Available)

> Acummulator Interface for UDFs
> --
>
> Key: PIG-979
> URL: https://issues.apache.org/jira/browse/PIG-979
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Alan Gates
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: PIG-979.patch, PIG-979.patch
>
>
> Add an accumulator interface for UDFs that would allow them to take a set 
> number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-979) Acummulator Interface for UDFs

2009-11-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-979:
---

Status: Patch Available  (was: Open)

> Acummulator Interface for UDFs
> --
>
> Key: PIG-979
> URL: https://issues.apache.org/jira/browse/PIG-979
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.4.0
>Reporter: Alan Gates
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: PIG-979.patch, PIG-979.patch
>
>
> Add an accumulator interface for UDFs that would allow them to take a set 
> number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key

2009-11-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1038:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

All javac warnings are deprecations. 1 release audit warning is fixed, 
remaining are not source code related. Also make minor changes to address 
Pradeep's comment. Patch committed. To disable secondary key optimization, use 
system property: pig.exec.nosecondarykey=true

> Optimize nested distinct/sort to use secondary key
> --
>
> Key: PIG-1038
> URL: https://issues.apache.org/jira/browse/PIG-1038
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1038-1.patch, PIG-1038-2.patch, PIG-1038-3.patch, 
> PIG-1038-4.patch, PIG-1038-5.patch
>
>
> If nested foreach plan contains sort/distinct, it is possible to use hadoop 
> secondary sort instead of SortedDataBag and DistinctDataBag to optimize the 
> query. 
> Eg1:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = order A by $1;
> generate group, D;
> }
> store C into 'myresult';
> We can specify a secondary sort on A.$1, and drop "order A by $1".
> Eg2:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = A.$1;
> E = distinct D;
> generate group, E;
> }
> store C into 'myresult';
> We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct 
> D" to a special version of distinct, which does not do the sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1038) Optimize nested distinct/sort to use secondary key

2009-11-11 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776853#action_12776853
 ] 

Pradeep Kamath commented on PIG-1038:
-

Changes look good. One observation is in SecondaryKeyOptimizer.java:
{code}
 if (r) // if we saw physical operator other than project in 
sort
// plan
 return;
{code}
 should we be setting sawInvalidPhysicalOper?

Other than that, +1 - please commit after making any change if required for the 
above.


> Optimize nested distinct/sort to use secondary key
> --
>
> Key: PIG-1038
> URL: https://issues.apache.org/jira/browse/PIG-1038
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1038-1.patch, PIG-1038-2.patch, PIG-1038-3.patch, 
> PIG-1038-4.patch, PIG-1038-5.patch
>
>
> If nested foreach plan contains sort/distinct, it is possible to use hadoop 
> secondary sort instead of SortedDataBag and DistinctDataBag to optimize the 
> query. 
> Eg1:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = order A by $1;
> generate group, D;
> }
> store C into 'myresult';
> We can specify a secondary sort on A.$1, and drop "order A by $1".
> Eg2:
> A = load 'mydata';
> B = group A by $0;
> C = foreach B {
> D = A.$1;
> E = distinct D;
> generate group, E;
> }
> store C into 'myresult';
> We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct 
> D" to a special version of distinct, which does not do the sorting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-6) Addition of Hbase Storage Option In Load/Store Statement

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-6:


Assignee: Samuel Guo

> Addition of Hbase Storage Option In Load/Store Statement
> 
>
> Key: PIG-6
> URL: https://issues.apache.org/jira/browse/PIG-6
> Project: Pig
>  Issue Type: New Feature
> Environment: all environments
>Reporter: Edward J. Yoon
>Assignee: Samuel Guo
> Fix For: 0.2.0
>
> Attachments: hbase-0.18.1-test.jar, hbase-0.18.1.jar, m34813f5.txt, 
> PIG-6.patch, PIG-6_V01.patch
>
>
> It needs to be able to load full table in hbase.  (maybe ... difficult? i'm 
> not sure yet.)
> Also, as described below, 
> It needs to compose an abstract 2d-table only with certain data filtered from 
> hbase array structure using arbitrary query-delimited. 
> {code}
> A = LOAD table('hbase_table');
> or
> B = LOAD table('hbase_table') Using HbaseQuery('Query-delimited by attributes 
> & timestamp') as (f1, f2[, f3]);
> {code}
> Once test is done on my local machines, 
> I will clarify the grammars and give you more examples to help you explain 
> more storage options. 
> Any advice welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-38) abstract PigScript parser

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-38:
-

Assignee: Christopher Olston

> abstract PigScript parser
> -
>
> Key: PIG-38
> URL: https://issues.apache.org/jira/browse/PIG-38
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
> Environment: grunt and pigpen
>Reporter: Christopher Olston
>Assignee: Christopher Olston
> Fix For: 0.1.0
>
> Attachments: pigScriptParser.patch
>
>
> I am developing Pig Pen, an Eclipse plugin for Pig. Pig Pen needs to parse 
> .pig scripts. The parsing is the same as for grunt, but the actions I take 
> are different (e.g., Pig Pen will ignore "store" commands for the purpose of 
> editing).
> What I'd like to do is create an abstract class PigScriptParser, which is 
> identical to the current GruntParser except no actions are taken. Then I'll 
> add a GruntParser that extends PigScriptParser, and has concrete 
> implementations of actions (e.g., what to do when a "store" command is 
> encountered).
> I'll also add a PigPenParser that also extends PigScriptParser.
> This should not affect the behavior of GruntParser at all -- it just 
> separates the parsing from the actuating.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-20) Sorting using custom comparison functions

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-20:
-

Assignee: Olga Natkovich

> Sorting  using custom comparison functions
> --
>
> Key: PIG-20
> URL: https://issues.apache.org/jira/browse/PIG-20
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.1.0
>
> Attachments: usercompare.patch
>
>
> Currently, onlu string based sorting is supported. Once we have types, 
> numeric sort will be supported as well. However, soem users express need for 
> custome comparison functions for sort.
> Alan put together a design document for this:
> http://wiki.apache.org/pig/UserDefinedOrdering

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-12) Please add timestamps to pig map/reduce progress messages

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-12:
-

Assignee: Alan Gates

> Please add timestamps to pig map/reduce progress messages
> -
>
> Key: PIG-12
> URL: https://issues.apache.org/jira/browse/PIG-12
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Olga Natkovich
>Assignee: Alan Gates
> Fix For: 0.1.0
>
> Attachments: timestamps.diff
>
>
> From one of the users: 
> --
> I'm spending a lot of time trying to optimize my pig queries for short
> run-times.  This process would be much easier if, in the progress output
> from pig (currently on stdout, but hopefully soon moving to  
> stderr?!), the
> initiation and completion of each map/reduce job could be  
> timestamped.  Pig
> already spits out messages of the form "- MapReduce Job -",  
> "Input:
> ...", "Combine: ...", etc; could you just add a "Timestamp: ..."
> field as well?Or ideally, both "Starting timestamp: ..." and  
> "Finishing
> timestamp ...".
> Additional comments from another user:
> --
> I'm adding my vote for this as well.
> I'd like to know timestamp and "running time" in seconds or D;H:M:S:
> Thu Oct 25 10:06:01 GMT 2007 (0:00:12:56): 56% done
> Starting and stopping timestamps in the log would also be valuable.
> Unforutately, there's no "workaround" such as putting a date command before 
> and after the pig command in logging --
> queuing times can be seconds to hours and completely mess up any notion of 
> job execution time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-13) need a way to find out what version of pig i'm using

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-13:
-

Assignee: Stefan Groschupf

> need a way to find out what version of pig i'm using
> 
>
> Key: PIG-13
> URL: https://issues.apache.org/jira/browse/PIG-13
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Olga Natkovich
>Assignee: Stefan Groschupf
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: PIG-13-svnOptional_v_1_r633244.patch, PIG-13_v_1.patch
>
>
> would be great if "pig -version" told me what version.
> also, the text prior to "USAGE: ..." could also print the version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-51) Combiner gives wrong result in the presence of flattening

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-51:
-

Assignee: Utkarsh Srivastava

> Combiner gives wrong result in the presence of flattening
> -
>
> Key: PIG-51
> URL: https://issues.apache.org/jira/browse/PIG-51
> Project: Pig
>  Issue Type: Bug
>Reporter: Utkarsh Srivastava
>Assignee: Utkarsh Srivastava
>Priority: Critical
> Fix For: 0.1.0
>
> Attachments: combiner-flatten.patch
>
>
> If you do something like
> a = load ... as (f1,f2,f3);
> b = group a by (f1,f2);
> c = foreach b generate flatten(group), SUM(a.f3);
> The reduce side refers to field number expecting data will not have been 
> flattened yet. But if the combiner kicks in, it already flattens the group, 
> leading to column references being wrong.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-55) Allow user control over split creation

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-55:
-

Assignee: Charlie Groves

> Allow user control over split creation
> --
>
> Key: PIG-55
> URL: https://issues.apache.org/jira/browse/PIG-55
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.0.0
>Reporter: Charlie Groves
>Assignee: Charlie Groves
> Fix For: 0.1.0
>
> Attachments: pig_chunker_split.patch, pig_chunker_split_v2.patch, 
> pig_chunker_split_v3.patch, pig_chunker_split_v4.patch, 
> pig_chunker_split_v5.patch, pig_chunker_split_v6.patch, 
> pig_chunker_split_v7.patch, replaceable_PigSplit.diff, 
> replaceable_PigSplit_v2.diff
>
>
> I have a dataset in HDFS that's stored in a file per column that I'd like to 
> access from pig.  This means I can't use LoadFunc to get at the data as it 
> only allows the loader access to a single input stream at a time.  To handle 
> this usage, I've broken the existing split creation code out into a few 
> classes and interfaces, and allowed user specified load functions to be used 
> in place of the existing code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-58) parameterized Pig scripts

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-58:
-

Assignee: Olga Natkovich

> parameterized Pig scripts
> -
>
> Key: PIG-58
> URL: https://issues.apache.org/jira/browse/PIG-58
> Project: Pig
>  Issue Type: New Feature
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.1.0
>
> Attachments: PIG-58_v1.patch, PIG-58_v2, PIG-58_v3.patch
>
>
> This feature has been requested by several users and would be very useful in 
> conjunction with streaming. The feature would allow pig script to include 
> parameters that are replaced at run time. For instance, if your script needs 
> to run on a daily basis over the data of the previous day, you would be able 
> to use the script and providing a date as a run-time parameter to it.
> Example:
> ===
> Pig script myscript.pig:
> A = load '/data/mydata/%date%';
> B = filter A by $0>'5';
> .
> Pig command line:
> pig -param date='20080110' myscript.pig
> Proposed interface and implementation:
> Interface:
> ===
> (0) Substitution will be only supported with pig script files.
> (1) Parameters are specified on the command line via -param = 
> construct. Multiple parameters can be specified. They are applied to the 
> script in the order they are specified on the command line
> (2) Default values for the parameters can be specified within the script via 
> decare statement:
> decare =
> (3) Withint the script the parameter will be enclosed in %%. \% can be used 
> te escape.
> Implementation:
> 
> Use preprocessor to do the substitution. The preprocessor would be invoced by 
> Main before grunt is instanciated and do the following:
> - create a new file in temp location
> - build a hash of parameters from command line and declare statement
> - for each line in the original script
>   if this is a declare line, skip it
>   else for each unescaped pattern %% look for a match in the hash. 
> Replace, if found.  Write the line to the temp file.
> - pass the temp file to grunt.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-56) implement Iterable in DataBag

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-56:
-

Assignee: Charlie Groves

> implement Iterable in DataBag
> 
>
> Key: PIG-56
> URL: https://issues.apache.org/jira/browse/PIG-56
> Project: Pig
>  Issue Type: Improvement
>Reporter: Charlie Groves
>Assignee: Charlie Groves
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: iterable_databag.patch
>
>
> Now that DataBag has an iterator method, it can implement Iterable with no 
> other changes.  This would allow bags to be used in a foreach loop like 
> for(Tuple t : bag) {
>   // do something with t
> }
> The attached patch has DataBag implement iterable and converts all bag 
> iterator usages in pig to use foreach loops.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-59) A new "ILLUSTRATE" command which will help people debug their pig programs

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-59:
-

Assignee: Shubham Chopra

> A new "ILLUSTRATE" command which will help people debug their pig programs
> --
>
> Key: PIG-59
> URL: https://issues.apache.org/jira/browse/PIG-59
> Project: Pig
>  Issue Type: New Feature
>  Components: grunt
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
> Fix For: 0.1.0
>
> Attachments: displayAlternate.patch, ExampleGenerator.patch, 
> ExampleGenerator.patch, ExampleGenerator.patch
>
>
> I propose to add a new "ILLUSTRATE" command to Pig, which will help people 
> debug their Pig programs.
> The idea is to select a few example data items, and illustrate how they are 
> transformed by the sequence of Pig commands in the user's program. I have an 
> algorithm that can select an appropriate and concise set of example data 
> items automatically. It does a better job than random sampling would do; for 
> example, random sampling suffers from the drawback that selective operations 
> such as filters or joins can eliminate *all* the sampled data items, giving 
> you empty results which is of no help in debugging.
> This "ILLUSTRATE" functionality will avoid people having to test their Pig 
> programs on large data sets, which has a long turnaround time and wastes 
> system resources.
> Proposed Implementation:
> I will create a new package called org.apache.pig.exgen, which will contain 
> the aforementioned algorithm. The algorithm uses the "Local" execution 
> operators (it does not run on hadoop), so as to generate illustrative example 
> data in near-real-time for the user. 
> For my algorithm to work properly, it needs to trace the "lineage" (sometimes 
> called "provenance") of data items as they flow through the local operator 
> tree corresponding to the user's Pig program. So I will have to add a 
> "lineage tracer" to the Local operators, which maintains a side data 
> structure to represent the lineage, or derivation sequence, among data items. 
> The lineage tracer will be DISABLED BY DEFAULT, so it will not affect normal 
> Pig operation.
> I will add a new method to PigServer called 
> "PigServer.showExamples(LogicalPlan)", which will cause my exgen algorithm to 
> be invoked.
> I will also add a new command to Grunt, called ILLUSTRATE. Syntactically it 
> will work the same way as the STORE command. For example, a user might type:
> grunt> visits = load 'visits.txt' as (user, url, timestamp);
> grunt> recent_visits = filter visits by timestamp >= '20071201';
> grunt> user_visits = group recent_visits by user;
> grunt> num_user_visits = foreach user_visits generate group, 
> COUNT(recent_visits);
> grunt> illustrate num_user_visits
> This would trigger my exgen algorithm, which will display something like:
> visits:
> (Amy, www.cnn.com, 20070218)
> (Fred, www.harvard.edu, 20071204)
> (Amy, www.bbc.com, 20071205)
> (Fred, www.stanford.edu, 20071206)
> recent_visits:
> (Fred, www.harvard.edu, 20071204)
> (Amy, www.bbc.com, 20071205)
> (Fred, www.stanford.edu, 20071206)
> user_visits:
> (Fred, { (Fred, www.harvard.edu, 20071204), (Fred, www.stanford.edu, 
> 20071206) } )
> (Amy, { (Amy, www.bbc.com, 20071205) } )
> num_user_visits:
> (Fred, 2)
> (Amy, 1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-65) convert tabs to spaces

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-65:
-

Assignee: Charlie Groves

> convert tabs to spaces
> --
>
> Key: PIG-65
> URL: https://issues.apache.org/jira/browse/PIG-65
> Project: Pig
>  Issue Type: Bug
>Reporter: Charlie Groves
>Assignee: Charlie Groves
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: tabs_to_spaces.diff, tabs_to_spaces_post_PIG-32.diff
>
>
> Many of the pig source files mix tabs and 4 spaces for indentation.  This is 
> particularly painful for me when reading the code as I've set up my editor to 
> indent tabs 8 spaces so I can catch if I actually use them anywhere, and the 
> source jumps back and forth in indentation level, sometimes from line to line.
> The patch replaces all tabs with 4 spaces in java code since that's what's 
> mentioned as the standard in the wiki.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-57) Occasional NullPointerException in PigContext.fixUpDomain method

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-57:
-

Assignee: Benjamin Francisoud

> Occasional NullPointerException in PigContext.fixUpDomain method
> 
>
> Key: PIG-57
> URL: https://issues.apache.org/jira/browse/PIG-57
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Xu Zhang
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: PIG-57-v01.patch
>
>
> I occasionally see the following NPE when running a Pig job with HOD:
> 2008-01-08 06:14:24,558 [main] INFO  org.apache.pig - Connecting to HOD...
> 2008-01-08 06:14:29,732 [main] INFO  org.apache.pig - HDFS Web UI: 
> nn-host:50070
> 2008-01-08 06:14:29,732 [main] INFO  org.apache.pig - JobTracker Web UI: 
> jt-host:54597
> 2008-01-08 06:14:29,846 [main] FATAL org.apache.pig - Could not connect to HOD
> java.lang.NullPointerException
>   at org.apache.pig.impl.PigContext.fixUpDomain(PigContext.java:350)
>   at org.apache.pig.impl.PigContext.doHod(PigContext.java:324)
>   at org.apache.pig.impl.PigContext.connect(PigContext.java:175)
>   at org.apache.pig.PigServer.(PigServer.java:128)
>   at org.apache.pig.tools.grunt.Grunt.(Grunt.java:37)
>   at org.apache.pig.Main.main(Main.java:212)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-68) Improving build.xml in many ways :)

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-68:
-

Assignee: Stefan Groschupf

> Improving build.xml in many ways :)
> ---
>
> Key: PIG-68
> URL: https://issues.apache.org/jira/browse/PIG-68
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Stefan Groschupf
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: build.xml, build.xml-PIG-68-v01.patch, 
> build.xml-PIG-68-v02.patch, build.xml-PIG-68-v03.patch, 
> build.xml-PIG-68-v04.patch, build.xml-PIG-68-v05.patch, 
> build.xml-PIG-68-v06-SG.patch, build.xml-PIG-68-v07-SG.patch, 
> build.xml-PIG-68-v08-SG.patch, build.xml-PIG-68-v09-SG.patch, out
>
>
> The build file can be improve in many ways:
> * add revision number to pig.jar name (like: pig-r1234.jar)
> * put pig.jar in the dist dir
> * "clean" target leave a "depend" folder undeleted
> * use a regexp to delete files in "org\apache\pig\impl\logicalLayer\parser" 
> folder instead of listing all files one by one that you want to delete
> * put all artifacts (classes, jar, etc...) in the dist folder so that when 
> doing clean you just need to specify dist
> * provide a description for targets (for "ant -projecthelp" command)
> * use spaces or tabs but not both (spaces are better for patch and diff in my 
> opinion)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-69) NullPointerException in setJobtrackerLocation() in PigContext.java:68

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-69:
-

Assignee: Benjamin Francisoud

> NullPointerException in setJobtrackerLocation() in PigContext.java:68
> -
>
> Key: PIG-69
> URL: https://issues.apache.org/jira/browse/PIG-69
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: PigContext-PIG-69-v01.patch, PigContext-PIG-69-v02.patch
>
>
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.pig.impl.PigContext.setJobtrackerLocation(PigContext.java:425)
> ... (the rest of the stacktrace is my own servlet code)
> {noformat}
> The code:
> {code:java}
> final PigContext pigContext = new PigContext(ExecType.MAPREDUCE);
> pigContext.setJobtrackerLocation(configuration.get("mapred.job.tracker"));
> pigContext.setFilesystemLocation(configuration.get("fs.default.name"));
> 
> final PigServer pigServer = new PigServer(pigContext);
> {code}
> Where configuration is a org.apache.hadoop.conf.Configuration object 
> initialized with spring framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-80) Stacktrace information is lost at MapReduceLauncher.java:289

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-80:
-

Assignee: Benjamin Francisoud

> Stacktrace information is lost at MapReduceLauncher.java:289
> 
>
> Key: PIG-80
> URL: https://issues.apache.org/jira/browse/PIG-80
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: PIG-80-generics.patch, PIG-80-v01.patch, 
> PIG-80-v02.patch, PIG-80-v03.patch, PIG-80-v04.patch, PIG-80-v05.patch, 
> PIG-80-v06-unit-test-only.patch
>
>
> {code:java}
> ...
> }catch (Exception e) {
> // Do we need different handling for different exceptions
> e.printStackTrace();
> throw new IOException(e.getMessage());
> }finally{ ...
> {code}
> in my case the sandard output is redirtected to /dev/null so 
> "e.printStackTrace();" is lost.
> it should be :
> {code:java}throw new IOException(e);{code} 
> no getMessage() because we loose the rest of the stacktrace

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-83) logging abstraction

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-83:
-

Assignee: Benjamin Francisoud

> logging abstraction
> ---
>
> Key: PIG-83
> URL: https://issues.apache.org/jira/browse/PIG-83
> Project: Pig
>  Issue Type: Wish
>Reporter: Stefan Groschupf
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: log4j.properties, logging.properties, PIG-83-v01.patch, 
> PIG-83-v02.patch, PIG-83-v03.patch
>
>
> Pig is logging quite a lot into System.out or System.err. Using a embedded 
> pig in a production environment requires a logging abstraction like log4j, 
> commons logging, slf4j or something like that. 
> I would be happy to work on a patch if we decide what would be the best 
> choice. Hadoop uses log4j.
> Thanks.
> Stefan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-78) src/org/apache/pig/builtin/PigStorage.java doesn't compile

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-78:
-

Assignee: Arun C Murthy

> src/org/apache/pig/builtin/PigStorage.java doesn't compile
> --
>
> Key: PIG-78
> URL: https://issues.apache.org/jira/browse/PIG-78
> Project: Pig
>  Issue Type: Bug
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.1.0
>
> Attachments: PIG-78_0_20080125.patch
>
>
> {noformat}
> compile:
>  [echo] *** Building Main Sources ***
> [javac] Compiling 6 source files to /Users/arunc/dev/java/pig/trunk/dist
> [javac] 
> /Users/arunc/dev/java/pig/trunk/src/org/apache/pig/builtin/PigStorage.java:85:
>  cannot find symbol
> [javac] symbol  : method getBytes(java.nio.charset.Charset)
> [javac] location: class java.lang.String
> [javac] os.write((f.toDelimitedString(this.fieldDel) + 
> (char)this.recordDel).getBytes(utf8));
> [javac]  ^
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-77) Add eclipse file to ignore list

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-77:
-

Assignee: Benjamin Francisoud

> Add eclipse file to ignore list
> ---
>
> Key: PIG-77
> URL: https://issues.apache.org/jira/browse/PIG-77
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: PIG-77-v01.patch
>
>
> I don't know if I'm the only one to use eclipse here but the .project, 
> .classpath and the folder .settings could be added to the svn:ingnore list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-87) Improvements to pig.pl: make pigclient.conf optional; check JAVA_HOME

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-87:
-

Assignee: Craig Macdonald

> Improvements to pig.pl: make pigclient.conf  optional; check JAVA_HOME
> --
>
> Key: PIG-87
> URL: https://issues.apache.org/jira/browse/PIG-87
> Project: Pig
>  Issue Type: Improvement
>Reporter: Craig Macdonald
>Assignee: Craig Macdonald
> Fix For: 0.1.0
>
> Attachments: pig.pl.patch
>
>
> Brief notes about the pig.pl, and a patch to resolve some of these
> 1. Is conf/pigclient.conf really required?
> pig.pl dies straight away if $ROOT/conf/pigclient.conf does not exist. 
> This is a shame, for a couple of reasons:
>  * the only really necessary detail in pigclient.conf is $pigJarRoot.
>  * $pigJarRoot, $hodRoot and $defaultCluster can be set using 
> pigclient.conf - why cant they also be:
> (a) worked out from defaults ? - eg $pigJarRoot
>  my $JAR = $0;#/scripts/pig.pl
>   $JAR =~ s/pig\.pl/..\/pig.jar/
> (b) $hodRoot - seem an obvious example to be configurable using the 
> command line arguments?
> (c) $defaultCluster - ditto?
>  * if conf/pigclient.conf doesnt exist, pig.pl dies before the --help options 
> can displayed (big shame)
> -> means that scripts/pig.pl -h doesnt work out the box as well as most of
> http://wiki.apache.org/pig/GettingStarted
>  * As far as I can see minimum setup for a new Pig user:
> cd pig
> (ant)
> mkdir conf
> echo "\$pigJarRoot = \"$PWD\"" > conf/pigclient.conf
> mkdir -p libexec/pig//released/
> cp pig.jar libexec/pig//released/
> ROOT=$PWD scripts/pig.pl
> or specify the class path manually.
> 2. Java binary is looked for in a special Yahoo place or in PATH, but 
> JAVA_HOME is not checked, as per other common startup scripts (eg Tomcat).
> 3. looking for java in the path
> `which java 2>&1 > /dev/null`;
> if ($? != 0) {
> I cant help thinking that this would be better/quicker in Perl:
> sub inpath
> {
> my $bin = shift;
> foreach my $dir (split /:/, $ENV{PATH})
> {
>return 1 if -e $dir/$bin;
> }
> return 0;
> }
> If this is deemed desirable, I can update the patch for this sub-issue too.
> Please find attached a patch for pig.pl that resolves issues 1 & 2. This 
> will allow the GettingStarted documentation to perform as expected 
> without all the rigmarole associated with pigclient.conf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-85:
-

Assignee: Pi Song

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> ---
>
> Key: PIG-85
> URL: https://issues.apache.org/jira/browse/PIG-85
> Project: Pig
>  Issue Type: Bug
>Reporter: Anand Murugappan
>Assignee: Pi Song
> Fix For: 0.1.0
>
> Attachments: PIG-85_v4.patch, PIG_85_escaping_parameters.patch, 
> PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. 
> Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other 
> strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid 
> XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-84) Remove e.printStacktrace() from code

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-84:
-

Assignee: Benjamin Francisoud

> Remove e.printStacktrace() from code
> 
>
> Key: PIG-84
> URL: https://issues.apache.org/jira/browse/PIG-84
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: PIG-84-v01.patch
>
>
> From [Benjamin Reed  in 
> PIG-80|https://issues.apache.org/jira/browse/PIG-80?focusedCommentId=12564097#action_12564097]:
> "At the same time we should also remove all e.printStackTrace() calls."
> I'll try to provide a patch for this...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-89) Too many spills to files causes ArrayIndexOutOfBoundsException if new temp file cant be created

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-89:
-

Assignee: Benjamin Francisoud

> Too many spills to files causes ArrayIndexOutOfBoundsException if new temp 
> file cant be created
> ---
>
> Key: PIG-89
> URL: https://issues.apache.org/jira/browse/PIG-89
> Project: Pig
>  Issue Type: Bug
>  Components: data
> Environment: Linux, Local execution Mode, JDK 1.6
>Reporter: Craig Macdonald
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: databag-89-v3.patch, patch-v2.defaultdatabag, 
> patch.defaultdatabag
>
>
> Hello,
> I am experimenting, trying to perform a DISTINCT on a medium sized set of 
> URLs - about 3million (same set as I discussed previously - Utkarsh has a 
> copy), this time in local execution mode.
> Pig script:
> {{
> A = LOAD 'all_13122007.txt';
> B = DISTINCT A;
> store B into 'bla;
> }}
> Bring these errors (two lines swapped in DefaultDatabag) to find real error.
> {{
> 2008-02-04 18:09:44,756 [Low Memory Detector] INFO  org.apache.pig - low 
> memory handler called init = 29491200(28800K) used = 269834064(263509K) 
> committed = 307036160(299840K) max = 471662592(460608K)
> 2008-02-04 18:09:45,355 [Low Memory Detector] ERROR org.apache.pig - Unable 
> to spill contents to disk
> java.io.IOException: Too many open files
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.checkAndCreate(File.java:1704)
> at java.io.File.createTempFile(File.java:1793)
> at java.io.File.createTempFile(File.java:1830)
> at org.apache.pig.data.DataBag.getSpillFile(DataBag.java:367)
> at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:69)
> at 
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
> at 
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at 
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.remove(ArrayList.java:390)
> at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:84)
> at 
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
> at 
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at 
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> Exception in thread "Low Memory Detector" java.lang.InternalError: Error in 
> invoking listener
> at 
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:141)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at 
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> }}
> There are a two sub-issues here: 
> 1. Pig spills too much using a default JVM (64MB) size - expected?
> Perhaps pig.pl should set a default JVM size of more than 64MB?
> 2. the line DefaultDataBag.java:84
> {{{
> mSpillFiles.remove(mSpillFiles.size() - 1);
> }}}
> line should check that mSpillFiles.size() > 0,  because if 
> File.createTempFile( ) in Databag.getSpillFile() fails, the mSpillFiles will 
> not yet have been updated. My preference would be to split try{ } catch 
> (IOException ioe) { } within DefaultDatabag.spill() into two exception 
> handlers - one for getSpillFile() errors, and one for actual writing errors 
> (when we know mSpillFiles has been added to).
> If this latter point isnt coherent, I can create patch.
> Ta muchly.
> C

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-91) outdated @Override tags

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-91:
-

Assignee: Johannes Zillmann

> outdated @Override tags
> ---
>
> Key: PIG-91
> URL: https://issues.apache.org/jira/browse/PIG-91
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Johannes Zillmann
>Assignee: Johannes Zillmann
> Fix For: 0.1.0
>
> Attachments: pig-overirde.patch
>
>
> There are a bunch of @Override tags which are not correct anymore (i guess 
> since PIG-32).
> In my ide (eclipse) this results in compiling errors.
> See for example
> HDataStorage.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-90) PigServer#store does swallow exception

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-90:
-

Assignee: Benjamin Francisoud

> PigServer#store does swallow exception
> --
>
> Key: PIG-90
> URL: https://issues.apache.org/jira/browse/PIG-90
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.0.0
>Reporter: Stefan Groschupf
>Assignee: Benjamin Francisoud
>Priority: Critical
> Fix For: 0.1.0
>
> Attachments: PIG-90-v01.patch
>
>
> My custom DatabaseStoreFunction throws an (runtime or ioException) exception 
> in putNext.
> Instead throwing this exception all the way up  (the exceptions contains a 
> nice error message text) however in pigServer 326 a 
> java.lang.NoSuchMethodError: java.io.IOException: method 
> (Ljava/lang/String;Ljava/lang/Throwable;)V not found Exception will be 
> thrown. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-88) the project does not compile because of reference to HadoopExe class in Main.java

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-88:
-

Assignee: Pi Song

> the project does not compile because of reference to HadoopExe class in 
> Main.java
> -
>
> Key: PIG-88
> URL: https://issues.apache.org/jira/browse/PIG-88
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.1.0
> Environment: Win XP
>Reporter: Pi Song
>Assignee: Pi Song
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: PIG-88.HadoopExe.patch
>
>   Original Estimate: 0.08h
>  Remaining Estimate: 0.08h
>
> The project does not compile because of this line in Main.java
> import org.apache.hadoop.util.HadoopExe;
> From HADOOP-435, the patch to introduce this class has been canceled plus the 
> class itself is not being used at all in Main.java.
> The simple patch removes that particular line and now the project compiles 
> successfully.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-92) PigContext NullPointerException because of uninitialize conf

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-92:
-

Assignee: Benjamin Francisoud

> PigContext NullPointerException because of uninitialize conf
> 
>
> Key: PIG-92
> URL: https://issues.apache.org/jira/browse/PIG-92
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: PIG-92-v01.patch, PIG-92-v02.patch
>
>
> This simple code throw an NPE
> {code:java}
> final PigContext pigContext = new PigContext(ExecType.MAPREDUCE);
> pigContext.getConf().putAll(properties);
> {code}
> Because in PigContext.java:
> {code:java}
> transient private Properties conf = null;
> public void connect() throws ExecException {
> ... 
> conf = new Properties();
> 
> }
> {code}
> Simple patch:
> {code:java}
> transient private Properties conf = new Properties();
> public void connect() throws ExecException {
> ... 
> }
> {code}
> This is regression already fix in PIG-69.
> Introduce with PIG-32

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-95) pig should not use System.exit() since this would crash the application pig is embedded in.

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-95:
-

Assignee: Stefan Groschupf

> pig should not use System.exit() since this would crash the application pig 
> is embedded in.
> ---
>
> Key: PIG-95
> URL: https://issues.apache.org/jira/browse/PIG-95
> Project: Pig
>  Issue Type: Improvement
>Reporter: Stefan Groschupf
>Assignee: Stefan Groschupf
>Priority: Critical
> Fix For: 0.1.0
>
> Attachments: 20080205-sg-noexit.patch
>
>
> As discussed remove all System.exit statments and throw an exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-100) Tests: NullPointerException parser.QueryParser.Alias(QueryParser.java:471)

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-100:
--

Assignee: Benjamin Francisoud

> Tests: NullPointerException parser.QueryParser.Alias(QueryParser.java:471)
> --
>
> Key: PIG-100
> URL: https://issues.apache.org/jira/browse/PIG-100
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: PIG-100-tests.log, PIG-100-v01.patch, PIG-100-v02.patch, 
> PIG-100-v03-spaces.patch, PIG-100-v03-tabs.patch
>
>
> I think the root problem was that I forget to specify the configuration using 
> -Djunit.hadoop.conf=hadoop-site.xml while running the tests.
> But the error could be clearer...
> The logs are big so I will provide them in a separate file...
> But the core problem is:
> {noformat}
> [junit] java.lang.NullPointerException
> [junit]   at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Alias(QueryParser.java:471)
> [junit]   at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedExpr(QueryParser.java:411)
> [junit]   at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedExpr(QueryParser.java:417)
> [junit]   at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.GroupItem(QueryParser.java:1027)
> ...
> [junit] org.apache.pig.impl.logicalLayer.parser.ParseException: 
> Encountered "group" at line 1, column 9.
> [junit] Was expecting one of:
> [junit]  ...
> [junit] "(" ...
> [junit] 
> [junit]   at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.generateParseException(QueryParser.java:4142)
> ...
> [junit] org.apache.pig.impl.logicalLayer.parser.ParseException: 
> Encountered "generate" at line 1, column 1.
> [junit] Was expecting one of:
> [junit] "load" ...
> [junit] "filter" ...
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-98) grunt should show full exception stack

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-98?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-98:
-

Assignee: Stefan Groschupf

> grunt should show full exception stack
> --
>
> Key: PIG-98
> URL: https://issues.apache.org/jira/browse/PIG-98
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Stefan Groschupf
>Assignee: Stefan Groschupf
>Priority: Minor
> Attachments: showStackTrace-20080207.patch
>
>
> I suggest grunt should be more helpful with user errors. I just did one (a 
> stupid one) and it took my too long to figure out the problem, since grunts 
> error message was just not giving me a good hint:
> grunt> A = LOAD '/pigtestData.tsv' USING PigStorage(',') AS (user,age,cat);
> grunt> B = FILTER A BY cat == 'book';
> grunt> dump B;
> For input string: "book"
> Experts will see that I tried to use == instead of eq, however especially new 
> users could get a little confused. 
> I see two chances add Error Number and  descriptive texts (Oracle style) - 
> this quite a lot of work, or for now I suggest to simply dump the full 
> exception text.
> At least for this early stage it would developers and users to find problems 
> faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-93) Impossible to set jobconf parameters

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-93:
-

Assignee: Benjamin Francisoud

> Impossible to set jobconf parameters
> 
>
> Key: PIG-93
> URL: https://issues.apache.org/jira/browse/PIG-93
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
>Priority: Critical
> Fix For: 0.1.0
>
> Attachments: PIG93Main.java
>
>
> I'm trying to set jobconf parameter before launching a pig job using pig api.
> I tried 2 different ways but with no success:
> {code:java}
> PigContext pigContext = new PigContext(ExecType.MAPREDUCE);
> pigContext.getExecutionEngine().getConfiguration().putAll(properties);
> PigServer pigServer = new PigServer(pigContext);
> 
> {code}
> Throw a NPE because the internal executionEngine var is initialize only when 
> calling connect().
> So I tried:
> {code:java}
> PigContext pigContext = new PigContext(ExecType.MAPREDUCE);
> pigContext.connect();
> pigContext.getExecutionEngine().getConfiguration().putAll(properties);
> PigServer pigServer = new PigServer(pigContext);
> ...
> {code}
> My properties have been replace with a "new JobConf()"
> {noformat}
> java.lang.RuntimeException: Bad mapred.job.tracker: local
> at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:711)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:149)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:180)
> {noformat}
> "properties" contains "mapred.job.tracker" and "hadoop.tmp.dir values"
> Before PIG-32 I use to do (and it was working): 
> {code:java}
> PigContext pigContext = new PigContext(ExecType.MAPREDUCE);
> pigContext.setConf(myJobConf);
> PigServer pigServer = new PigServer(pigContext);
> ...
> {code}
> Any idea before I start to work on a patch ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-101) Use ExecType.MAPREDUCE instead of duplicate string to initialize PigServer in tests

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-101:
--

Assignee: Benjamin Francisoud

> Use ExecType.MAPREDUCE instead of duplicate string to initialize PigServer in 
> tests
> ---
>
> Key: PIG-101
> URL: https://issues.apache.org/jira/browse/PIG-101
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
>Priority: Trivial
> Fix For: 0.1.0
>
> Attachments: PIG-101-v01.patch, PIG-101-v02.patch
>
>
> In the tests code, there are lots of:
> {code:java}
> private String initString = "mapreduce";
> @Test
> public void testSomething() {
> 
> PigServer pig = new PigServer(initString);
> 
> }
> {code}
> It could be replace with 
> {code:java}
> PigServer pig = new PigServer(ExecType.MAPREDUCE);
> {code}
> It would remove duplication in test.
> Using a string makes the tests aware of the internal PigServer behavior.
> It's really not a big deal hence the "trivial" :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-115) start script for pig

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-115:
--

Assignee: Stefan Groschupf

> start script for pig
> 
>
> Key: PIG-115
> URL: https://issues.apache.org/jira/browse/PIG-115
> Project: Pig
>  Issue Type: Improvement
>Reporter: Stefan Groschupf
>Assignee: Stefan Groschupf
> Fix For: 0.1.0
>
> Attachments: PIG-115_v_1.patch, PIG-115_v_2.patch, PIG-115_v_3.patch, 
> PIG-115_v_4_r634426.patch
>
>
> The current pig.pl is very y! specific, a generic start script is required 
> that works for all users.
> Goal of this issue is to collect a list requirements a new script has to 
> fulfill.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-109) improve exception handling for function instantiation

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-109:
--

Assignee: Johannes Zillmann

> improve exception handling for function instantiation
> -
>
> Key: PIG-109
> URL: https://issues.apache.org/jira/browse/PIG-109
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Johannes Zillmann
>Assignee: Johannes Zillmann
> Fix For: 0.1.0
>
> Attachments: PIG-109_620665.patch, pigExceptionPatch-627601.diff
>
>
> Running pig on a cluster i got an instantiation exception for my custom 
> StoreFunc:
> {noformat}
> 08/02/13 22:58:42 ERROR mapreduceExec.MapReduceLauncher: Error message from 
> task (map) tip_200802110401_0072_m_00 java.lang.RuntimeException: 
> java.io.IOException: null
> at org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:427)
> at 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:435)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:58)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:47)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.setupMapPipe(PigMapReduce.java:205)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:103)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)
> {noformat} 
> Easy to figure out that there is a problem with my StoreFunc, but hard to 
> figure out what exactly.
> Looking into the pig code up from PigContext#instantiateFunc() there is a 
> kind of exception handling which seems unecessary complicated.
> Any exception which can happen while instantiating the store func (like 
> InstantiationException or InvocationTargetException) is catched and wrapped 
> with a IOException. 
> Later on the cause of the IOException is inspected (LOLoad, around line 60) 
> or wrapped into a RuntimeException without handing the causes over (PigSplit, 
> around line 101).
> Since every exception which can raise on PigContext#instantiateFunc() is 
> rather an user error since a temporary environment problem, i think this 
> method can just throw an unchecked exception and don't have to declare 
> IOeception anymore. This should save a lot of trouble in calling methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-107) Some test methods are not run because there is no @Test annotation

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-107:
--

Assignee: Benjamin Francisoud

> Some test methods are not run because there is no @Test annotation
> --
>
> Key: PIG-107
> URL: https://issues.apache.org/jira/browse/PIG-107
> Project: Pig
>  Issue Type: Test
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: PIG-107-v01.patch
>
>
> I don't know if that's on purpose but in TestLogicalPlanBuilder.java, those 
> methods don't have the @Test annotation and therefore are not run with latest 
> junit (in my case in eclipse):
> {code:java}
> public void testQuery41() {}
> public void testQuery42() {}
> public void testQuery43() {}
> public void testQuery44() {}
> public void testQueryFail44() throws Throwable {}
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-113) Make Grunt's explain output more understandable

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-113:
--

Assignee: Pi Song

> Make Grunt's explain output more understandable
> ---
>
> Key: PIG-113
> URL: https://issues.apache.org/jira/browse/PIG-113
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Affects Versions: 0.1.0
>Reporter: Pi Song
>Assignee: Pi Song
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: pig_printtree_1.patch, pig_printtree_2.patch
>
>
> I think it would be better if we can display the execution plan in a more 
> understandable way. One intuitive way to do this is to show output as a tree 
> like in SQL Server.
> Possibly we can  have 'AS ' as optional argument for explain command
> For example
> {noformat}
> Grunt> explain bag1 AS tree ;
> Grunt> explain bag1 AS xml ;
> {noformat}
> and 
> {noformat}
> Grunt> explain bag1   
> {noformat}
> will display the default format
> I have included a patch that does generate tree output.
> Here is a sample of the existing output format
> {noformat}
> Logical Plan:
> Group root-Sun Feb 17 19:37:07 GMT+10:00 2008-5
> Object id: 9814147
> Inputs: 26335425 
> Schema: (group, (sum, (), (), ()))
> EvalSpecs:
> Generate: has 2 children
> Project: (0)
> Star
> Split root-Sun Feb 17 19:37:07 GMT+10:00 2008-2
> Object id: 25199001
> Inputs: 29132923 
> Schema: (sum, (), (), ())
> EvalSpecs:
> Eval root-Sun Feb 17 19:37:07 GMT+10:00 2008-1
> Object id: 29132923
> Inputs: 10774273 
> Schema: (sum, (), (), ())
> EvalSpecs:
> Generate: has 4 children
> FuncEval: name: org.apache.pig.impl.builtin.ADD args:
> Generate: has 2 children
> Project: (0)
> Project: (1)
> Project: (0)
> Project: (1)
> Project: (2)
> Load root-Sun Feb 17 19:37:07 GMT+10:00 2008-0
> Object id: 10774273
> Inputs: 
> Schema: ()
> EvalSpecs:
> ---
> Physical Plan:
> MAPREDUCE
> Object id: 17671659
> Inputs: 682933706
> Map: 
> Star
> Grouping Funcs: 
> Generate: has 2 children
> Project: (0)
> Star
> Input Files: /tmp/temp678140026/tmp1867058340
> MAPREDUCE
> Object id: 17308974
> Inputs: 
> Map: 
> Composite: has 2 children
> Star
> Generate: has 4 children
> FuncEval: name: org.apache.pig.impl.builtin.ADD args:
> Generate: has 2 children
> Project: (0)
> Project: (1)
> Project: (0)
> Project: (1)
> Project: (2)
> Input Files: /tmp/data1.txt
> Output File: /tmp/temp678140026/tmp1613817084
> {noformat}
> Here is a sample of my tree output which is more compact and more 
> understandable :-
> {noformat}
> grunt> explain c1 as tree ;
> Logical Plan:
> |---LOCogroup ( GENERATE {[PROJECT $0],[*]} ) 
>   |---LOSplitOutput (  ) 
> |---LOSplit ( ([PROJECT $0] < ['5']),([PROJECT $0] >= ['5']) ) 
>   |---LOEval ( GENERATE 
> {[org.apache.pig.impl.builtin.ADD(GENERATE {[PROJECT $0],[PROJECT 
> $1]})],[PROJECT $0],[PROJECT $1],[PROJECT $2]} ) 
> |---LOLoad ( file = /tmp/data1.txt )
> ---
> Physical Plan:
> |---POMapreduce
> Map : *
> Grouping : Generate(Project(0),*)
> Input File(s) : /tmp/temp678140026/tmp1867058340
>   |---POMapreduce
>   Map : 
> Composite(*,Generate(FuncEval(org.apache.pig.impl.builtin.ADD(Generate(Project(0),Project(1,Project(0),Project(1),Project(2)))
>   Input File(s) : /tmp/data1.txt
> {noformat}
> I'm also thinking about doing output as xml as it might benefit people who 
> are working on displaying execution plan on GUI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-108) PigCombine does not use configure method and therefore de-serialize and instantiate objects with every reduce call

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-108:
--

Assignee: Stefan Groschupf

> PigCombine does not use configure method and therefore de-serialize and 
> instantiate objects with every reduce call
> --
>
> Key: PIG-108
> URL: https://issues.apache.org/jira/browse/PIG-108
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Stefan Groschupf
>Assignee: Stefan Groschupf
>Priority: Critical
> Fix For: 0.1.0
>
> Attachments: PIG-108-r639015-v1.patch
>
>
> There some significant space for improvement in the PigCombine. 
> In each reduce call some objects are deserialized from the jobConf and also 
> the object graph is generated again and again. 
> Hadoop garanties to call the configure method before a run through and things 
> like inputCount can be than cached as fields. 
> During reduce calls the jobConf will not change so re deserialization and 
> instantiation of all this objects 
> pigContext, evalPipe, inputCount, oc, finalout, esp and so on and so on, 
> makes no sense from my point of view.
> Not sure how often the PigCombine is used, but it will significant improve 
> performance if we fix this.
> Was there any reason to do things like this or is that just historical? 
> As soon the test suite is running again, I would be happy to work on a patch 
> if there is no other options about that. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-118) UNION/CROSS/JOIN operations should not allow 1 operand

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-118:
--

Assignee: Pi Song

> UNION/CROSS/JOIN operations should not allow 1 operand
> --
>
> Key: PIG-118
> URL: https://issues.apache.org/jira/browse/PIG-118
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.0.0
>Reporter: Pi Song
>Assignee: Pi Song
> Fix For: 0.1.0
>
> Attachments: pig_1operand.patch
>
>
> At the moment UNION/CROSS/JOIN allow 1 operand.
> You can write:-
> {noformat}
> b = UNION a ;
> c = CROSS b ;
> d = JOIN c BY $0 ;
> {noformat}
> Possibly UNION with 1 operand might be needed for implementing Sigma-styled 
> union (Ui=1..n An)  but for CROSS/JOIN I think nobody would do such operation.
> By simply replacing "*" with "+" in the parser tree should fix this problem. 
> Should this be fixed?
> {noformat}
> LogicalOperator CrossClause() : {LogicalOperator op; ArrayList 
> inputs = new ArrayList();}
> {
>   (
>   op = NestedExpr() { inputs.add(op.getOperatorKey()); }
>   ("," op = NestedExpr() { inputs.add(op.getOperatorKey()); })*
>   )
>   {return rewriteCross(inputs);}
> }
> LogicalOperator JoinClause() : {CogroupInput gi; ArrayList gis 
> = new ArrayList();}
> {
>   (gi = GroupItem() { gis.add(gi); }
>   ("," gi = GroupItem() { gis.add(gi); })*)
>   {return rewriteJoin(gis);}
> }
> LogicalOperator UnionClause() : {LogicalOperator op; ArrayList 
> inputs = new ArrayList();}
> {
>   (op = NestedExpr() { inputs.add(op.getOperatorKey()); }
>   ("," op = NestedExpr() { inputs.add(op.getOperatorKey()); })*)
>   {return new LOUnion(opTable, scope, getNextId(), inputs);}
> }
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-124) only run one test (aant runtest -Dtest=TestMapReduce) not the complete test suite

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-124:
--

Assignee: Stefan Groschupf

> only run one test (aant runtest -Dtest=TestMapReduce) not the complete test 
> suite
> -
>
> Key: PIG-124
> URL: https://issues.apache.org/jira/browse/PIG-124
> Project: Pig
>  Issue Type: Improvement
>Reporter: Stefan Groschupf
>Assignee: Stefan Groschupf
> Fix For: 0.1.0
>
> Attachments: PIG-124_v_1.patch, RunIndividualTestCase.patch
>
>
> +1 to what Xu is saying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-127) Add descriptions to ant target

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-127:
--

Assignee: Benjamin Francisoud

> Add descriptions to ant target
> --
>
> Key: PIG-127
> URL: https://issues.apache.org/jira/browse/PIG-127
> Project: Pig
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: PIG-127-v01.patch
>
>
> In PIG-68, I used the "description" attribute to provide help when doing "ant 
> -projecthelp"
> It seems the last patch commited lost those informations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-125) improve exception handling and expressivness around tuple field access

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-125:
--

Assignee: Johannes Zillmann

> improve exception handling and expressivness around tuple field access
> --
>
> Key: PIG-125
> URL: https://issues.apache.org/jira/browse/PIG-125
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Johannes Zillmann
>Assignee: Johannes Zillmann
> Fix For: 0.1.0
>
> Attachments: PIG-125.patch
>
>
> Stumbled over the case that i'm accessing fields in a tuple which type are 
> not as i expected. The stack trace in one case looked as follow:
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: execution failed
>   at com.my.Executor.run(Executor.java:284)
> Caused by: java.io.IOException: Unable to store alias C
>   at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16)
>   at org.apache.pig.PigServer.store(PigServer.java:335)
>   at org.apache.pig.PigServer.store(PigServer.java:317)
>   at com.my.Executor.run(Executor.java:280)
>   ... 2 more
> Caused by: org.apache.pig.backend.executionengine.ExecException
>   at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:137)
>   at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:32)
>   at org.apache.pig.PigServer.store(PigServer.java:332)
>   ... 4 more
> Caused by: java.io.IOException: Incompatible type for request getAtomField().
>   at org.apache.pig.data.Tuple.getAtomField(Tuple.java:177)
>   at com.my.DatabaseStoreFunc.putNext(DatabaseStoreFunc.java:83)
>   at org.apache.pig.impl.io.PigFile.store(PigFile.java:64)
>   at 
> org.apache.pig.backend.local.executionengine.POStore.getNext(POStore.java:105)
>   at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:130)
>   ... 6 more
> {noformat}
> The exception message and the stacktrace gave me a clue what kind of problem 
> i was facing. But to know what exactly happened i needed to debug (or 
> temporarily add some system-outs).  
> Looking at the code (of Tuple class) i think the exception-information can be 
> improved easily (add index and actual field type information) .
> Also it seems that there is some space for simplifying the exception handling.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-122) remove TokenMgrError and Co from svn properties in src/org/apache/pig/tools/pigscript/parser

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-122:
--

Assignee: Benjamin Francisoud

> remove TokenMgrError and Co from svn properties in 
> src/org/apache/pig/tools/pigscript/parser
> 
>
> Key: PIG-122
> URL: https://issues.apache.org/jira/browse/PIG-122
> Project: Pig
>  Issue Type: Improvement
>Reporter: Stefan Groschupf
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: PIG-122-v02.patch, PIG-122-v03.patch, PIG-122_v_1.patch
>
>
> This is obsolete now and also will help people using the new build.xml to 
> recognize they need to delete this files.
> Also we should add src-gen to the svn:ignore.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-120) support hadoop map reduce in loal mode

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-120:
--

Assignee: Stefan Groschupf

> support hadoop map reduce in loal mode
> --
>
> Key: PIG-120
> URL: https://issues.apache.org/jira/browse/PIG-120
> Project: Pig
>  Issue Type: Bug
>Reporter: Stefan Groschupf
>Assignee: Stefan Groschupf
> Fix For: 0.1.0
>
> Attachments: PIG-120_v_1.patch
>
>
> Currently pig support mapreduce and local as execution modes. 
> LocalExecutionEngine is used for local and HExecutionEngine for map reduce. 
> HExecutionEngine always expect that hadoop runs as cluster with a name node 
> and jobtracker listing on a port. 
> Though, hadoop can also run in a local mode (LocalJobRunner) this would give 
> several advantages. 
> First it would speed up the test suite significant. Second it would be 
> possible to debug map reduce plans easily.
> For example we was able to debug and reproduce PIG-110 with this method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-139) Command line editing, history and more for Grunt

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-139:
--

Assignee: Daniel Dai

> Command line editing, history and more for Grunt
> 
>
> Key: PIG-139
> URL: https://issues.apache.org/jira/browse/PIG-139
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
> Environment: Grunt
>Reporter: Amir Youssefi
>Assignee: Daniel Dai
> Fix For: 0.2.0
>
> Attachments: jline-0.9.94.jar, jline.patch, jline2.patch, 
> jline3.patch, jline4.patch
>
>
> We need to add support of command line editing, history and more for Grunt. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-137) test instantiation of StoreFunc in LOStore swallows (cause) exceptions

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-137:
--

Assignee: Johannes Zillmann

> test instantiation of StoreFunc in LOStore swallows (cause) exceptions
> --
>
> Key: PIG-137
> URL: https://issues.apache.org/jira/browse/PIG-137
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Johannes Zillmann
>Assignee: Johannes Zillmann
> Attachments: PIG-137-633746.patch
>
>
> The current handling
> {noformat}
> IOException ioe = new IOException(e.getMessage());
> ioe.setStackTrace(e.getStackTrace());
> throw ioe;
> {noformat}
> passes the exception message and the stacktrace of the exception, but not the 
> stacktraces of the exceptions wich caused the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-155) logo improvement

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-155:
--

Assignee: Stefan Groschupf

> logo improvement
> 
>
> Key: PIG-155
> URL: https://issues.apache.org/jira/browse/PIG-155
> Project: Pig
>  Issue Type: Improvement
>Reporter: Stefan Groschupf
>Assignee: Stefan Groschupf
>Priority: Trivial
> Attachments: 080224_logo_pig_01_rgb.jpg, pig_logo_improvement.zip
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-134) Update Java version requirement on deployment page

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-134:
--

Assignee: Benjamin Francisoud

> Update Java version requirement on deployment page
> --
>
> Key: PIG-134
> URL: https://issues.apache.org/jira/browse/PIG-134
> Project: Pig
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.1.0
>Reporter: Benjamin Francisoud
>Assignee: Benjamin Francisoud
> Fix For: 0.1.0
>
> Attachments: PIG-134-v01.patch
>
>
> In http://incubator.apache.org/pig/deployment.html, this line is outdated:
> {quote}
> Requirements
>1. Java *1.6.x.* preferably from Sun. Set JAVA_HOME to the root of your 
> Java installation.
> {quote}
> It is *1.5.x*
> I will provide the patch for deployment.xml, but I think you need to 
> regenerate the forrest documentation (html and pdf).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-178) Use of schema on a secondary output of SPLIT throws IndexOutOfBoundsException

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-178:
--

Assignee: Mathieu Poumeyrol

> Use of schema on a secondary output of SPLIT throws IndexOutOfBoundsException
> -
>
> Key: PIG-178
> URL: https://issues.apache.org/jira/browse/PIG-178
> Project: Pig
>  Issue Type: Bug
>  Components: impl
> Environment: not relevant
>Reporter: Mathieu Poumeyrol
>Assignee: Mathieu Poumeyrol
> Fix For: 0.1.0
>
> Attachments: PigSplit.patch, TestPigSplit.patch
>
>
> outputSchema for LOSplitOutput is trivialy broken. patch including testcase 
> and fix are coming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-171) Top K

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-171:
--

Assignee: Daniel Dai

> Top K
> -
>
> Key: PIG-171
> URL: https://issues.apache.org/jira/browse/PIG-171
> Project: Pig
>  Issue Type: Sub-task
>Affects Versions: 0.2.0
>Reporter: Amir Youssefi
>Assignee: Daniel Dai
> Fix For: 0.2.0
>
> Attachments: limit1.patch, limit2.patch, limit3.patch
>
>
> Frequently, users are interested on Top results (especially Top K rows) . 
> This can be implemented efficiently in Pig /Map Reduce settings to deliver 
> rapid results and low Network Bandwidth/Memory usage.
>  
>  Key point is to prune all data on the map side and keep only small set of 
> rows with Top criteria . We can do it in Algebraic function (combiner) with 
> multiple value output. Only a small data-set gets out of mapper node.
> The same idea is applicable to solve variants of this problem:
>   - An Algebraic Function for 'Top K Rows'
>   - An Algebraic Function for 'Top K' values ('Top Rank K' and 'Top Dense 
> Rank K')
>   - TOP K ORDER BY.
> Another words implementation is similar to combiners for aggregate functions 
> but instead of one value we get multiple ones. 
> I will add a sample implementation for Top K Rows and possibly TOP K ORDER BY 
> to clarify details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-203) pig parser hangs on input script bigger ~1kb

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-203:
--

Assignee: Mathieu Poumeyrol

> pig parser hangs on input script bigger ~1kb
> 
>
> Key: PIG-203
> URL: https://issues.apache.org/jira/browse/PIG-203
> Project: Pig
>  Issue Type: Bug
>Reporter: Mathieu Poumeyrol
>Assignee: Mathieu Poumeyrol
> Fix For: 0.1.0
>
> Attachments: Main.patch
>
>
> When the command line interpreter is run on a file bigger than 1kb or so, it 
> overflows the PipeReader/PipeWriter internal buffers and freezes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-201) BufferedPositionedInputStream is not buffered

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-201:
--

Assignee: Mathieu Poumeyrol

> BufferedPositionedInputStream is not buffered
> -
>
> Key: PIG-201
> URL: https://issues.apache.org/jira/browse/PIG-201
> Project: Pig
>  Issue Type: Bug
>Reporter: Mathieu Poumeyrol
>Assignee: Mathieu Poumeyrol
> Attachments: BufferedPositionedInputStream.patch
>
>
> BufferedPositionedInputStream is actualy not buffered, leading (I guess) to 
> constant round trip to dfs as byte are read one by one. I just wrapped the 
> provided input stream in the constructor in a good old BufferedInputStream.
> I measured a 40% performance boost on a script that reads and writes 3.7GB in 
> dfs through PigStorage on one node. I guess the impact may be greater on a 
> real hdfs cluster with actual network roundtrips.
> FYI, the issue was found while profiling with Yourkit java profiler. Usefull 
> toy...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-200) Pig Performance Benchmarks

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-200:
--

Assignee: Alan Gates

> Pig Performance Benchmarks
> --
>
> Key: PIG-200
> URL: https://issues.apache.org/jira/browse/PIG-200
> Project: Pig
>  Issue Type: Task
>Reporter: Amir Youssefi
>Assignee: Alan Gates
> Attachments: generate_data.pl, perf.hadoop.patch, perf.patch
>
>
> To benchmark Pig performance, we need to have a TPC-H like Large Data Set 
> plus Script Collection. This is used in comparison of different Pig releases, 
> Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).
> Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance
> I am currently running long-running Pig scripts over data-sets in the order 
> of tens of TBs. Next step is hundreds of TBs.
> We need to have an open large-data set (open source scripts which generate 
> data-set) and detailed scripts for important operations such as ORDER, 
> AGGREGATION etc.
> We can call those the Pig Workouts: Cardio (short processing), Marathon (long 
> running scripts) and Triathlon (Mix). 
> I will update this JIRA with more details of current activities soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-202:
--

Assignee: Mathieu Poumeyrol

> ComparatorFunc provided to ORDER clause is not always honoured
> --
>
> Key: PIG-202
> URL: https://issues.apache.org/jira/browse/PIG-202
> Project: Pig
>  Issue Type: Bug
>Reporter: Mathieu Poumeyrol
>Assignee: Mathieu Poumeyrol
> Fix For: 0.1.0
>
> Attachments: EvalSpec.patch, InstantiateFunc.patch, 
> MapreducePlanCompiler.patch, quantiles.in, quantiles.pig, Sort.patch, 
> Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local 
> implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-207) New illustrate command does not work in mapreduce mode.

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-207:
--

Assignee: Shubham Chopra

> New illustrate command does not work in mapreduce mode.
> ---
>
> Key: PIG-207
> URL: https://issues.apache.org/jira/browse/PIG-207
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Alan Gates
>Assignee: Shubham Chopra
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: exgen.patch
>
>
> In local mode, illustrate will work.  But if exectype is set to mapreduce, 
> then:
> {noformat}
> grunt> a = load 'data/test.txt';
> grunt> b = filter a by $0 eq 'f2';
> grunt> illustrate b;
> 2008-04-16 00:03:06,512 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> java.lang.ClassCastException: 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine cannot be cast 
> to org.apache.pig.backend.local.executionengine.LocalExecutionEngine
> at org.apache.pig.pen.ExGen.GenerateExamples(ExGen.java:61)
> at org.apache.pig.PigServer.showExamples(PigServer.java:573)
> at org.apache.pig.PigServer.showExamples(PigServer.java:569)
> at 
> org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:131)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:172)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:72)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
> at org.apache.pig.Main.main(Main.java:272)
> {noformat}
> dump a and dump b work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-213) Non-static Log objects in org.apache.pig.data.* classes are inefficient

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-213:
--

Assignee: Vadim Geshel

> Non-static Log objects in org.apache.pig.data.* classes are inefficient
> ---
>
> Key: PIG-213
> URL: https://issues.apache.org/jira/browse/PIG-213
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.1.0
>Reporter: Vadim Geshel
>Assignee: Vadim Geshel
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: logging.patch
>
>
> LogFactory.getLog called from the constructor of Tuple accounts for 
> significant percentage of my job's running time. The proposed fix is to make 
> the Log fields static (which is generally standard practice).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-215) Miscellaneous cleanups after PIG-111 Configuration

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-215:
--

Assignee: Pi Song

> Miscellaneous cleanups after PIG-111 Configuration
> --
>
> Key: PIG-215
> URL: https://issues.apache.org/jira/browse/PIG-215
> Project: Pig
>  Issue Type: Bug
>Reporter: Pi Song
>Assignee: Pi Song
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: CleanPig111.patch, CleanPig111_2.patch
>
>
> - Set default execution mode to MapReduce (This is a surprise as it should 
> have been fixed even before PIG-111 got checked-in)
> - When using local hadoop, changed message "Connecting to HDFS at null" to 
> "Connecting to HDFS at local"
> - Added the missing conf/log4j.properties
> - Removed some dead code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-234) fix synchronization around staleCount in DataCollector

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-234:
--

Assignee: Chad Whipkey

> fix synchronization around staleCount in DataCollector
> --
>
> Key: PIG-234
> URL: https://issues.apache.org/jira/browse/PIG-234
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Chad Whipkey
>Assignee: Chad Whipkey
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: Change_synchronization_on_DataCollector.patch
>
>
> DataCollector uses synchronized statements on staleCount, but the staleCount 
> reference changes!  I'm proposing it switch to use the concurrent package 
> Lock and condition to manage staleness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-219) Pig tests must cover local and mapreduce execution types

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-219:
--

Assignee: Mathieu Poumeyrol

> Pig tests must cover local and mapreduce execution types
> 
>
> Key: PIG-219
> URL: https://issues.apache.org/jira/browse/PIG-219
> Project: Pig
>  Issue Type: Bug
>Reporter: Mathieu Poumeyrol
>Assignee: Mathieu Poumeyrol
> Fix For: 0.1.0
>
> Attachments: Test.all.v1.patch, Test.v1.patch
>
>
> Followup of Local and MapReduce Test modes in pig-dev.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-243) Make pig work on Windows

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-243:
--

Assignee: Daniel Dai

> Make pig work on Windows
> 
>
> Key: PIG-243
> URL: https://issues.apache.org/jira/browse/PIG-243
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
> Fix For: 0.1.0
>
> Attachments: cygpath.patch, PIG_243.patch
>
>
> Currently a large number of unit tests is failing on Windows. We need to fix 
> that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using "*" operator

2009-11-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776832#action_12776832
 ] 

Hadoop QA commented on PIG-1064:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424676/PIG-1064.patch
  against trunk revision 835005.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/149/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/149/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/149/console

This message is automatically generated.

> Behvaiour of COGROUP with and without schema when using "*" operator
> 
>
> Key: PIG-1064
> URL: https://issues.apache.org/jira/browse/PIG-1064
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
>Assignee: Pradeep Kamath
> Fix For: 0.6.0
>
> Attachments: PIG-1064.patch
>
>
> I have 2 tab separated files, "1.txt" and "2.txt"
> $ cat 1.txt 
> 
> 1   2
> 2   3
> 
> $ cat 2.txt 
> 1   2
> 2   3
> I use COGROUP feature of Pig in the following way:
> $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main
> {code}
> grunt> A = load '1.txt';
> grunt> B = load '2.txt' as (b0, b1);
> grunt> C = cogroup A by *, B by *;  
> {code}
> 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1012: Each COGroup input has to have the same number of inner plans
> Details at logfile: pig_1256845224752.log
> ==
> If I reverse, the order of the schema's
> {code}
> grunt> A = load '1.txt' as (a0, a1);
> grunt> B = load '2.txt';
> grunt> C = cogroup A by *, B by *;  
> {code}
> 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1013: Grouping attributes can either be star (*) or a list of expressions, 
> but not both.
> Details at logfile: pig_1256845224752.log
> ==
> Now running without schema??
> {code}
> grunt> A = load '1.txt';
> grunt> B = load '2.txt';
> grunt> C = cogroup A by *, B by *;
> grunt> dump C; 
> {code}
> 2009-10-29 12:55:37,202 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
> stored result in: "file:/tmp/temp-319926700/tmp-1990275961"
> 2009-10-29 12:55:37,202 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
> written : 2
> 2009-10-29 12:55:37,202 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
> : 154
> 2009-10-29 12:55:37,202 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2009-10-29 12:55:37,202 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> ((1,2),{(1,2)},{(1,2)})
> ((2,3),{(2,3)},{(2,3)})
> ==
> Is this a bug or a feature?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-250) Pig is broken with speculative execution

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-250:
--

Assignee: Olga Natkovich

> Pig is broken with speculative execution
> 
>
> Key: PIG-250
> URL: https://issues.apache.org/jira/browse/PIG-250
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.1.0
>
> Attachments: PIG-250.patch, PIG-250_v2.patch
>
>
> If I have speculative execution turned on, the following script fails:
> a = load 'studenttab20m' as (name, age, gpa);
> b = load 'votertab10k' as (name, age, registration, contributions);
> c = filter a by age < '50';
> d = filter b by age < '50';
> e = cogroup c by (name, age), d by (name, age) parallel 10;
> f = foreach e generate flatten(c), flatten(d) parallel 10;
> g = group f by registration parallel 10;
> h = foreach g generate group, SUM(f.d::contributions) parallel 10;
> i = order h by ($1, $0);
> store i into 'out';
> I traced this to the fact that the first MR job produces one or more empty 
> outputs from the reducer. This happened on the reducers that happened to have 
> second task running.
> I am not sure what the issue is and I am working with hadoop guys to 
> investigate. Until this issue is resolved, I would like to trun speculative 
> execution off.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-256) support non default constructor with variable number of arguments

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-256:
--

Assignee: Pi Song

> support non default constructor with variable number of arguments
> -
>
> Key: PIG-256
> URL: https://issues.apache.org/jira/browse/PIG-256
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ajay Garg
>Assignee: Pi Song
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: PIG_256_vararg_instantiation.patch
>
>
> pig does not support non default constructor with variable number of 
> arguments support. In our case we need this because the number of variables 
> that are specified by the user are varying. The fix is simple. Pig calls 
> getConstr("agr1","arg2",...,"argn") and if it doesn't find it throws a 
> noSuchMethodFound exception. In the catch block we just need to add code to 
> check if we can wrap the arg1..n in a String[] and check if a constructor can 
> be found with this signature getConstr(args[]). This would resolve the 
> variable num args issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-255) Calling non default constructor of Final class from Main class in UDF

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-255:
--

Assignee: Ajay Garg

> Calling non default constructor of Final class from Main class in UDF
> -
>
> Key: PIG-255
> URL: https://issues.apache.org/jira/browse/PIG-255
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ajay Garg
>Assignee: Ajay Garg
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: cons.patch, new.patch, test.patch
>
>
> Pig supports the use of define to call a non default constructor. Making it 
> work across Algebraic functions is not possible with the current code. The 
> problem is once the func is defined to use a non default constructor which 
> takes in names of the variables, we have no way of transmitting this 
> information from the main class to the final class. We tried passing the func 
> spec through the call to getFinal(). That is, What ever names we get in the 
> main class we store it and when the getFinal method is called, instead of 
> just passing the name of the Final class we attach the string args received 
> by the main class to the name to construct a func spec. For ex. if define COV 
> = Covariance('Population', 'Height'); Then we would have the "Population' & 
> 'Height' stored in the main class. A call to getFinal would return 
> Covariance$Final("Population", "Height") instead of just Covariance$Final. I 
> guess this is the right way to go. However, pig has a problem with this. The 
> resolveClassName method doesn't think of its args as specs and assumes them 
> to be just names. So in createJar, when the func spec, 
> Covariance$Final("Population", "Height") is being resolved it fails. I think 
> this is an issue with pig and we need to resolve it by clipping the args 
> before doing a resolveClassName. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-258) Pig should cleanup output directory of a failed query

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-258:
--

Assignee: Daniel Dai

> Pig should cleanup output directory of a failed query
> -
>
> Key: PIG-258
> URL: https://issues.apache.org/jira/browse/PIG-258
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Daniel Dai
>Priority: Minor
> Attachments: clearoutput.patch, clearoutput2.patch
>
>
> Currently, after a failed store, the output directory is left behind and 
> can't be re-used without manual cleanup

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-270) Show Line Number in Pig Error Messages

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-270:
--

Assignee: Daniel Dai

> Show Line Number in Pig Error Messages
> --
>
> Key: PIG-270
> URL: https://issues.apache.org/jira/browse/PIG-270
> Project: Pig
>  Issue Type: Improvement
>Reporter: Amir Youssefi
>Assignee: Daniel Dai
> Attachments: linenum.patch
>
>
> It will be a great help to users to show A) Line Number B) Actual Line in Pig 
> Error Messages. Currently user has to copy/paste script line by line in Grunt 
> to find out line that ran into a problem. For Grunt we can skip line number. 
> Alternatively, we can assign line numbers in Grunt and show it in command 
> prompt alongside "grunt>".  This could be a separate issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-277) UDF for computing correlation and covariance between data sets

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-277:
--

Assignee: Ajay Garg

> UDF for computing correlation and covariance between data sets
> --
>
> Key: PIG-277
> URL: https://issues.apache.org/jira/browse/PIG-277
> Project: Pig
>  Issue Type: New Feature
>Reporter: Ajay Garg
>Assignee: Ajay Garg
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: newStats.patch, stat.patch
>
>
> UDFs for computing correlation and covariance between data sets. Use 
> following commands to compute covariance
> A = load 'input.xml' using PigStorage(':');
> B = group A all;
> define c COV('a','b','c');
> D = foreach B generate group,c(A.$0,A.$1,A.$2);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-284) target for building source jar

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-284:
--

Assignee: Johannes Zillmann

> target for building source jar
> --
>
> Key: PIG-284
> URL: https://issues.apache.org/jira/browse/PIG-284
> Project: Pig
>  Issue Type: Wish
>  Components: tools
>Reporter: Johannes Zillmann
>Assignee: Johannes Zillmann
>Priority: Minor
> Fix For: 0.1.0
>
> Attachments: Pig-284-v1.patch
>
>
> It would be a great help, if pig's build.xml would be capable of building a 
> source jar.
> The source jar could i.e. be used by eclipse and thus provides better 
> debugging support, original parameter names, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-288) Null pointer exception with load as schema - Optimizer

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-288:
--

Assignee: Pi Song

> Null pointer exception with load as schema - Optimizer
> --
>
> Key: PIG-288
> URL: https://issues.apache.org/jira/browse/PIG-288
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Assignee: Pi Song
> Attachments: PIG_288_OptimizerNPE.patch, PIG_288_OptimizerNPE_2.patch
>
>
> A new test case (testNestedPlan) added to TestEvalPipeline has the following 
> query:
> pig.registerQuery("A = LOAD 'file:" + tmpFile + "'as (a:int, 
> b:int);");
> pig.registerQuery("B = group A by $0;");
> + "C1 = filter A by $0 > -1;"
> + "C2 = distinct C1;"
> + "C3 = distinct A;"
> + "generate (int)group;"
> + "};";
> Testcase: testNestedPlan took 0.913 sec
> Caused an ERROR
> Unable to open iterator for alias: C
> java.io.IOException: Unable to open iterator for alias: C
> at 
> org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
> at org.apache.pig.PigServer.openIterator(PigServer.java:268)
> at 
> org.apache.pig.test.TestEvalPipeline.testNestedPlan(TestEvalPipeline.java:376)
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: Unable to 
> insert type casts into plan
> at 
> org.apache.pig.impl.logicalLayer.optimizer.TypeCastInserter.transform(TypeCastInserter.java:144)
> at 
> org.apache.pig.impl.plan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:63)
> at org.apache.pig.PigServer.compileLp(PigServer.java:551)
> at org.apache.pig.PigServer.execute(PigServer.java:477)
> at org.apache.pig.PigServer.openIterator(PigServer.java:259)
> ... 16 more
> Caused by: java.lang.NullPointerException
> at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:121)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:65)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:273)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:37)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:57)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.TypeCastInserter.transform(TypeCastInserter.java:141)
> ... 20 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-278) Allow no alias in Dot schema definition in Dot LogicalPlanLoader

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-278:
--

Assignee: Pi Song

> Allow no alias in Dot schema definition in Dot LogicalPlanLoader
> 
>
> Key: PIG-278
> URL: https://issues.apache.org/jira/browse/PIG-278
> Project: Pig
>  Issue Type: Bug
>Reporter: Pi Song
>Assignee: Pi Song
> Attachments: AllowNoAliasSchemaInDot.patch, 
> AllowNoAliasSchemaInDot2.patch
>
>
> Our schema parser doesn't allow "null" alias but we have to be able to do 
> that in Dot test files.
> This is a work around by introducing "[NoAlias]" keyword in schema definition 
> just for Dot LogicalPlanLoader.
> Sample:-
> {noformat}
> foreach [  key="20", type="LOForEach" , schema="[NoAlias] : long, [NoAlias] : 
> byteArray"] ;
> {noformat}
> At runtime, [NoAlias] will be substituted by dummy column names before being 
> sent to the parser. Subsequently those names will be replaced by "null". 
> There is no changes in the actual query parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-298) Wrong sort logic in POSort

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-298:
--

Assignee: Pi Song

> Wrong sort logic in POSort
> --
>
> Key: PIG-298
> URL: https://issues.apache.org/jira/browse/PIG-298
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pi Song
>Assignee: Pi Song
> Attachments: Wrong_Sort_logic_in_POSort.patch
>
>
> This might relate to PIG-292.
> The current logic is obviously wrong as it only returns the comparison return 
> of the last comparison only!!.
> Patch + tests attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-291) hod.param parameters not passed properly

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-291:
--

Assignee: Ian Atha

> hod.param parameters not passed properly
> 
>
> Key: PIG-291
> URL: https://issues.apache.org/jira/browse/PIG-291
> Project: Pig
>  Issue Type: Bug
>  Components: impl
> Environment: Linux hostname 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 
> EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
> Apache Pig version 0.1.0-dev (r8087) 
> Hadoop 0.17.1 Subversion 
> http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 669344
> hod --version: 0.17.1
>Reporter: Ian Atha
>Assignee: Ian Atha
> Attachments: pig-291.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> pig -Dhod.param='-N hodclustername' script.pig
> fails with the following error:
> 2008-07-03 17:53:18,236 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to HOD...
> org.apache.pig.backend.executionengine.ExecException: Could not connect to HOD
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.doHod(HExecutionEngine.java:428)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:121)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:108)
> at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
> at org.apache.pig.PigServer.(PigServer.java:149)
> at org.apache.pig.tools.grunt.Grunt.(Grunt.java:43)
> at org.apache.pig.Main.main(Main.java:293)
> Caused by: org.apache.pig.backend.executionengine.ExecException: 
> org.apache.pig.backend.executionengine.ExecException: Failed to run command 
> hod allocate -d /tmp/PigHod.hostname.thatha.304309240344558 -n 15 -N 
> hodclustername   on server local; return code: 4; error: CRITICAL - qsub 
> Failure : qsub: illegal -N value 
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.runCommand(HExecutionEngine.java:541)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.doHod(HExecutionEngine.java:373)
> ... 6 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: Failed to 
> run command hod allocate -d /tmp/PigHod.hostname.thatha.304309240344558 -n 15 
> -N hodclustername   on server local; return code: 4; error: CRITICAL - qsub 
> Failure : qsub: illegal -N value 
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.runCommand(HExecutionEngine.java:538)
> ... 7 more
> It appears that the problem is in the parsing of hod.param, located in 
> org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java, in 
> doHod(...).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-293) order by * goes into infinite loop

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-293:
--

Assignee: Santhosh Srinivasan

> order by * goes into infinite loop
> --
>
> Key: PIG-293
> URL: https://issues.apache.org/jira/browse/PIG-293
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Alan Gates
>Assignee: Santhosh Srinivasan
> Fix For: 0.2.0
>
> Attachments: sort_star_with_project.patch
>
>
> Scripts with order by * go into an infinite loop.  Worse yet, they appear to 
> be reporting progress to hadoop in this loop, and thus are never terminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-297) RM a non-existing file should not fail the script

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-297:
--

Assignee: Yiping Han

> RM a non-existing file should not fail the script
> -
>
> Key: PIG-297
> URL: https://issues.apache.org/jira/browse/PIG-297
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Reporter: Yiping Han
>Assignee: Yiping Han
>Priority: Minor
> Attachments: PIG-297.patch, PIG-297v2.patch
>
>
> rm is commonly used to remove the existing output before re-execute a script. 
> However, when the output is not existing, rm will fail and grunt will 
> terminate the execution. Such a behavior is  very inconvenience. Expected 
> grunt behavior would print some error message and continue to execute the 
> script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-315) Issue with cast in foreach

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-315:
--

Assignee: Pi Song

> Issue with cast in foreach
> --
>
> Key: PIG-315
> URL: https://issues.apache.org/jira/browse/PIG-315
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pi Song
> Fix For: 0.2.0
>
> Attachments: PIG315.patch
>
>
> Query which causes error:
> {code}
> a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, 
> gpa:double);
> b = foreach a generate (long)age as age, (int)gpa as gpa;
> c = foreach b generate SUM(age), SUM(gpa); 
> store c into ':OUTPATH:';\,
> {code}
> Error:
> {quote}
> 2008-07-14 16:34:42,130 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: mytesthost:8020
> 2008-07-14 16:34:42,187 [main] WARN  org.apache.hadoop.fs.FileSystem - 
> "mytesthost:8020" is a deprecated filesystem name. Use 
> "hdfs://mytesthost:8020/" instead.
> 2008-07-14 16:34:42,441 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: mytesthost:50020
> 2008-07-14 16:34:42,696 [main] WARN  org.apache.hadoop.fs.FileSystem - 
> "mytesthost:8020" is a deprecated filesystem name. Use 
> "hdfs://mytesthost:8020/" instead.
> 2008-07-14 16:34:43,006 [main] ERROR org.apache.pig.PigServer - Problem 
> resolving LOForEach schema
> 2008-07-14 16:34:43,006 [main] ERROR org.apache.pig.PigServer - Severe 
> problem found during validation 
> org.apache.pig.impl.plan.PlanValidationException: An unexpected exception 
> caused the validation to stop
> 2008-07-14 16:34:43,007 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
> java.io.IOException: Unable to store for alias: c
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-300) Minor Changes to SliceWrapper for Generic Hadoop InputFormat

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-300:
--

Assignee: Christian Kunz

> Minor Changes to SliceWrapper for Generic Hadoop InputFormat
> 
>
> Key: PIG-300
> URL: https://issues.apache.org/jira/browse/PIG-300
> Project: Pig
>  Issue Type: Improvement
> Environment: trunk
>Reporter: Christian Kunz
>Assignee: Christian Kunz
> Fix For: 0.2.0
>
> Attachments: PIG-300.patch
>
>
> I am working on a Load Function that allows to specify any Hadoop 
> FileInputFormat or CompositeInputFormat.
> Because of the nature of PigSlice and PigSlicer such a UDF needs to use a 
> different implementation of Slice and Slicer.
> It turns out that it would be extremely helpful if the SliceWrapper class had 
> a couple of minor changes:
> 1) an additional get method to return the 'wrapped' slice.
> 2) change to getLocations method to just call the getLocations() method of 
> the wrapped Slice, unless 'wrapped' is a PigSlice (in which case it just does 
> what it does now).
> I will make a patch available shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-308) Flatten is not being set to true in joins

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-308:
--

Assignee: Alan Gates

> Flatten is not being set to true in joins
> -
>
> Key: PIG-308
> URL: https://issues.apache.org/jira/browse/PIG-308
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Fix For: 0.2.0
>
> Attachments: join.patch
>
>
> Queries that use the JOIN keyword are returning incorrect results because the 
> flatten values are not being set to true for the foreach that is put after 
> the cogroup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-339) Limit follow cross/union return wrong number of records

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-339:
--

Assignee: Daniel Dai

> Limit follow cross/union return wrong number of records
> ---
>
> Key: PIG-339
> URL: https://issues.apache.org/jira/browse/PIG-339
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.2.0
>
> Attachments: PIG-339.patch
>
>
> The following script returns double records as expected:
> a = load 'a';
> b = load 'b';
> c = union a, b;
> d = cross a, b;
> e = limit c 100;
> f = limit d 100;
> dump e;   // return double number of records
> dump f;// return double number of records
> Seems to be the limit operator in reduce plan is not effective.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-321) Incorrect results from arithmetic expression

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-321:
--

Assignee: Pi Song

> Incorrect results from arithmetic expression
> 
>
> Key: PIG-321
> URL: https://issues.apache.org/jira/browse/PIG-321
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pi Song
> Fix For: 0.2.0
>
> Attachments: Pig321_parser.patch
>
>
> Query:
> {code}
> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name:chararray, 
> age:int, gpa:double);
> b = foreach a generate 1 + 0.2f + 253645L, gpa+1; 
> 
> store b into '/tmp/arithtest';
> 
> {code}
> Results
> 25365.2 2.9
> 25365.2 4.65
> ...
> The first projection above has 253645 as a Long constant. The results have 
> 25365.2 which is an order less

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-337) If limit size exceeds number of records in the file, a few records get dropped

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-337:
--

Assignee: Daniel Dai

> If limit size exceeds number of records in the file, a few records get dropped
> --
>
> Key: PIG-337
> URL: https://issues.apache.org/jira/browse/PIG-337
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Alan Gates
>Assignee: Daniel Dai
> Fix For: 0.2.0
>
> Attachments: PIG-337.patch
>
>
> Given a file with 10k records, the following script returned 9996 records:
> a = load 'studenttab10k';
> b = limit a 10;
> dump b;
> It looks like maybe the limit operator isn't returning its last record or 
> something.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-338) limit return uncorrect records following distinct

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-338:
--

Assignee: Daniel Dai

> limit return uncorrect records following distinct
> -
>
> Key: PIG-338
> URL: https://issues.apache.org/jira/browse/PIG-338
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.2.0
>
> Attachments: PIG-338-2.patch, PIG-338.patch, 
> TEST-org.apache.pig.test.TestLogicalOptimizer.txt
>
>
> The following script return fewer records than expected:
> a = load 'f';
> b = distinct a;
> c = limit b 10;
> dump c;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-319) Union, Cross is not working

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-319:
--

Assignee: Pi Song

> Union, Cross is not working
> ---
>
> Key: PIG-319
> URL: https://issues.apache.org/jira/browse/PIG-319
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Daniel Dai
>Assignee: Pi Song
> Fix For: 0.2.0
>
> Attachments: fix_union.patch
>
>
> union and cross operator is not working in branches/types. For example:
> a = load 'a';
> b = load 'b';
> c = union a, b;
> d = cross a, b;
> dump c; // fail
> dump d; // fail
> Error message: " Attempt to give operator of type 
> org.apache.pig.impl.physicalLayer.relationalOperators.POLoad multiple inputs. 
>  This operator does not support multiple inputs."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-368) User defined Loader functions need a way to get jobconf without going through Slicer

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-368:
--

Assignee: Pradeep Kamath

> User defined Loader functions need a way to get jobconf without going through 
> Slicer
> 
>
> Key: PIG-368
> URL: https://issues.apache.org/jira/browse/PIG-368
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
> Attachments: PIG-368.patch
>
>
> Some user defined loader functions in the current pig release (without types) 
> need the JobConf to build the appropriate RecordReader. Currently they do 
> this in a round about way by using the Slicer. The jobConf should be 
> available from PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-352) java.lang.ClassCastException when invalid field is accessed

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-352:
--

Assignee: Santhosh Srinivasan

> java.lang.ClassCastException when invalid field is accessed
> ---
>
> Key: PIG-352
> URL: https://issues.apache.org/jira/browse/PIG-352
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Olga Natkovich
>Assignee: Santhosh Srinivasan
> Fix For: 0.2.0
>
> Attachments: out_of_bound_schema_access.patch
>
>
> grunt> A = load 'foo' as (a, b, c);
> grunt> B = foreach A generate $5;
> 2008-07-31 16:25:13,847 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
> java.lang.ClassCastException: 
> org.apache.pig.impl.logicalLayer.FrontendException
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:454)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:248)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:425)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:92)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
> at org.apache.pig.Main.main(Main.java:278)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-342) Size of DistinctDataBag is calculated incorrectly if spill occurs and non-distinct elements are inserted

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-342:
--

Assignee: Brandon Dimcheff

> Size of DistinctDataBag is calculated incorrectly if spill occurs and 
> non-distinct elements are inserted
> 
>
> Key: PIG-342
> URL: https://issues.apache.org/jira/browse/PIG-342
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.1.0
>Reporter: Brandon Dimcheff
>Assignee: Brandon Dimcheff
> Fix For: 0.1.0
>
> Attachments: size.patch
>
>
> If a spill occurs while elements are being inserted into a DistinctDataBag, 
> it's possible that non-unique items will be added to the in-memory data 
> structure, and the mSize counter will be incremented.  If the same elements 
> also exist on disk, the count will be higher than it should be.
> The following is copied from an email exchange I had with Alan Gates:
> Alan,
> Thanks for your help.  I've done a bit more experimentation and have 
> discovered a couple more things.  I first looked at how COUNT was 
> implemented.  It looks like COUNT calls size() on the bag, which will return 
> mSize.  I thought that mSize might be calculated improperly so I added 
> "SUM(unique_ids) AS crazy_userid_sum" to my GENERATE line and re-ran the 
> pigfile:
> GENERATE FLATTEN(group), SUM(nice_data.duration) AS total_duration, 
> COUNT(nice_data) AS channel_switches, COUNT(unique_ids) AS unique_users, 
> SUM(unique_ids) AS crazy_userid_sum;
> It turns out that the SUM generates the correct result in all cases, while 
> there are still occasional errors in the COUNT.  Since SUM requires an 
> iteration over all the elements in the DistinctDataBag, this led me to 
> believe that the uniqueness constraint is indeed operating correctly, but 
> there is some error in the logic that calculates mSize.
> Then I started poking around in DistinctDataBag looking for anything that 
> changes mSize that might be incorrect.  I noticed that on line 87 in 
> addAll(), the size of the DataBag that is passed into the method is added to 
> the mSize instance variable, and then during the iteration a few lines later 
> mSize is being incremented when an element is successfully added to 
> mContents.  I thought this might be the problem, since it seems like elements 
> would be double counted if addAll() was called.  I commented out line 87, 
> recompiled Pig, and ran it again, but there are still errors (though I do 
> think line 87 might be incorrect anyways).
> Thanks to my coworker Marshall, I think we may have discovered what the 
> actual problem is.  The scenario is as follows:  We're adding a bunch of 
> stuff to the bag, and before we're finished a spill occurs.  mContents is 
> cleared during the spill (line 157).  All add() does is check uniqueness 
> against mContents.  So now we will get duplicates in mContents that are 
> already on disk and an inflated mSize.  Now, the reason why SUM works is 
> because the iterator is smart and enforces uniqueness as it reads the records 
> back in. We think this occurs at the beginning of addToQueue, around line 363 
> - 369.  mMergeTree is a TreeSet, so it'll enforce uniqueness and the call to 
> addToQueue is aborted if there's already a matching record in mMergeTree.
> Do you think our assessment is correct?  If so, it seems that the calculation 
> of mSize needs to be significantly more complex than it is now.  It looks to 
> me like the entire bag will need to be iterated in order to reliably 
> calculate the size.  Do you have any ideas about how to implement this in a 
> less expensive way?  I'd be happy to take a stab at it, but I don't want to 
> do anything particularly silly if you have a better idea.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-367) provide default schema name

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-367:
--

Assignee: Olga Natkovich

> provide default schema name
> ---
>
> Key: PIG-367
> URL: https://issues.apache.org/jira/browse/PIG-367
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.2.0
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.2.0
>
> Attachments: PIG-367.patch
>
>
> This is just to help UDFs to name their ouput

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-428) TypeCastInserter does not replace projects in inner plans correctly

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-428:
--

Assignee: Pradeep Kamath

> TypeCastInserter does not replace projects in inner plans correctly
> ---
>
> Key: PIG-428
> URL: https://issues.apache.org/jira/browse/PIG-428
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
> Attachments: PIG-428.patch
>
>
> The TypeCastInserter tries to replace the Project's input operator in inner 
> plans with the new foreach operator it adds. However it should replace only 
> those Projects' input where the new Foreach has been added after the operator 
> which was earlier the input to Project.
> Here is a query which fails due to this:
> {code}
> a = load 'st10k' as (name:chararray,age:int, gpa:double);
> another = load 'st10k';
> c = foreach another generate $0, $1+ 10, $2 + 10;
> d = join a by $0, c by $0;
> dump d;
> {code}
> Here is the error:
> {noformat}
> 2008-09-11 23:34:28,169 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) tip_200809051428_0045_m_00java.io.IOException: 
> Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, 
> recieved org.apache.pig.impl.io.NullableBytesWritable
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:419)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:83)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-429) Self join wth implicit split has the join output in wrong order

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-429:
--

Assignee: Pradeep Kamath

> Self join wth implicit split has the join output in wrong order
> ---
>
> Key: PIG-429
> URL: https://issues.apache.org/jira/browse/PIG-429
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
> Attachments: PIG-429.patch
>
>
> Query:
> {code}
> A = load 'st10k' split by 'file';
> B = filter A by $1 > 25;
> D = join A by $0, B by $0;
> dump D;
> {code}
> In the output the columns from B are projected out first and from A next. On 
> closer examination of the code, the ImplicitSplitInserter class adds in the 
> split and two splitoutput operators into the plan and tries the connect the 
> successors of LOad to these. However it does this by iterating over its 
> successors and disconnecting from them and connecting up the 
> split-splitoutput to the successors. However the order in which it gets its 
> successors is NOT the same as the order in which cogroup (join) expects its 
> inputs. Hence the discrepancy. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-413) TestBuiltin has an error in testSumFinal

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-413:
--

Assignee: Pradeep Kamath

> TestBuiltin has an error in testSumFinal 
> -
>
> Key: PIG-413
> URL: https://issues.apache.org/jira/browse/PIG-413
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
> Attachments: PIG-413.patch
>
>
> Here's the error:
> {noformat}
> Testcase: testSUMFinal took 0.005 sec
> Caused an ERROR
> Caught exception in IntSum.Final [java.lang.Integer]
> java.io.IOException: Caught exception in IntSum.Final [java.lang.Integer]
> at org.apache.pig.builtin.IntSum$Final.exec(IntSum.java:90)
> at org.apache.pig.builtin.IntSum$Final.exec(IntSum.java:71)
> at org.apache.pig.test.TestBuiltin.testSUMFinal(TestBuiltin.java:436)
> Caused by: java.lang.ClassCastException: java.lang.Integer
> ... 18 more 
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-439) Currently we do not support A=B; correctly - for now capture this case and produce a meaningful message

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-439:
--

Assignee: Pradeep Kamath

> Currently we do not support A=B; correctly - for now capture this case and 
> produce a meaningful message
> ---
>
> Key: PIG-439
> URL: https://issues.apache.org/jira/browse/PIG-439
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
> Attachments: PIG-439.patch
>
>
> Currently we do not support A=B; correctly - for now capture this case and 
> produce a meaningful message - A separate JIRA-438 has been created to fix 
> the main issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-452) Issues when non existent columns are projected

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-452:
--

Assignee: Alan Gates

> Issues when non existent columns are projected
> --
>
> Key: PIG-452
> URL: https://issues.apache.org/jira/browse/PIG-452
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Alan Gates
> Fix For: 0.2.0
>
> Attachments: PIG-452.patch
>
>
> Script:
> {code}
> -- columns x,y,z do not exist
> a = load 'st10k' as (name, age, gpa, x, y, z);
> b = load 'st10k' as (name, age:chararray, gpa);
> c = join a by (name, y), b by (name, age);
> dump c;
> {code}
> Error:
> {noformat}
> 2008-09-23 14:22:20,237 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Job failed!
> 2008-09-23 14:22:20,253 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) tip_200809051428_0112_m_00java.io.IOException: 
> Received Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:79)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> 2008-09-23 14:22:20,253 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) tip_200809051428_0112_m_00java.io.IOException: 
> Received Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:79)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> 2008-09-23 14:22:20,253 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) tip_200809051428_0112_m_00java.io.IOException: 
> Received Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:79)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> 2008-09-23 14:22:20,259 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) tip_200809051428_0112_m_00java.io.IOException: 
> Received Error while processing the map plan.
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:79)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> java.io.IOException: Unable to open iterator for alias: c [Job terminated 
> with anomalous status FAILED]
> at org.apache.pig.PigServer.openIterator(PigServer.java:384)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:268)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:176)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:83)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
> ... 6

[jira] Assigned: (PIG-431) When the specified load function cannot be found the error message is totally incomprehensible.

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-431:
--

Assignee: Pradeep Kamath

> When the specified load function cannot be found the error message is totally 
> incomprehensible.
> ---
>
> Key: PIG-431
> URL: https://issues.apache.org/jira/browse/PIG-431
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Alan Gates
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
>
> "a = load ':INPATH:/singlefile/studenttab10k' using NoSuchFunction(':');
> In Pig 1.x the resulting error message was:
> Could not resolve NoSuchFunction
> In 2.0 instead the user gets
> java.lang.ClassCastException: java.io.IOException
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1104)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:869)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:728)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:529)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
> at org.apache.pig.PigServer.parseQuery(PigServer.java:290)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:258)
> at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:432)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:242)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:83)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
> at org.apache.pig.Main.main(Main.java:306)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-434) AND and OR do not give right results with nulls

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-434:
--

Assignee: Pradeep Kamath

> AND and OR do not give right results with nulls
> ---
>
> Key: PIG-434
> URL: https://issues.apache.org/jira/browse/PIG-434
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: 0.2.0
>
> Attachments: PIG-434.patch
>
>
> Here are the truth tables for AND and OR - currently we do not short circuit 
> and return a null if either operand is null (for both AND and OR)
> {noformat}
> truth table for AND 
> t = true, n = null, f = false
> AND  t n f
>   tt n f
>  n   n n f
>  ff f f
> truth table for OR 
> t = true, n = null, f = false
> OR   t n f
>  tt t t
> nt n n
> f t n f
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-476:
--

Assignee: Earl Cahill

> given a date that can match a SimpleDateFormat want to be able to extract 
> arbitrary SimpleDateFormat data, like day or year
> ---
>
> Key: PIG-476
> URL: https://issues.apache.org/jira/browse/PIG-476
> Project: Pig
>  Issue Type: New Feature
>Reporter: Earl Cahill
>Assignee: Earl Cahill
> Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime,
>  "", "dd/MMM/:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime,
>  "MM-dd-");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-474) from pig latin, be able to load a file based on a supplied regular expression

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-474:
--

Assignee: Earl Cahill

> from pig latin, be able to load a file based on a supplied regular expression
> -
>
> Key: PIG-474
> URL: https://issues.apache.org/jira/browse/PIG-474
> Project: Pig
>  Issue Type: New Feature
>Reporter: Earl Cahill
>Assignee: Earl Cahill
> Attachments: MyRegExLoader-PIG-474
>
>
> Want to be able to do something like
>  A = LOAD 'file:test.txt' USING 
> org.apache.pig.piggybank.storage.MyRegExLoader('(\\d+)!+(\\w+)~+(\\w+)');
>  
>  which would parse lines like
>  
> 1!!!one~i
> 2!!two~~ii
> 3!three~~~iii
>  
> into arrays like
>  
> {1, "one", "i"}, {2, "two", "ii"}, {3, "three", "iii"}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-472) load files based on user provided regular expressions

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-472:
--

Assignee: Earl Cahill

> load files based on user provided regular expressions
> -
>
> Key: PIG-472
> URL: https://issues.apache.org/jira/browse/PIG-472
> Project: Pig
>  Issue Type: New Feature
>  Components: data, grunt
>Affects Versions: 0.1.0
>Reporter: Earl Cahill
>Assignee: Earl Cahill
> Fix For: 0.1.0
>
> Attachments: RegExLoader-PIG-472
>
>
> Want to be able to load files based on regular expressions.  Each group 
> specified in parenthesis should end up as a DataAtom, and the list of 
> DataAtoms should end up in a Tuple.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-473) be able to load files in Apache's common log format

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-473:
--

Assignee: Earl Cahill

> be able to load files in Apache's common log format
> ---
>
> Key: PIG-473
> URL: https://issues.apache.org/jira/browse/PIG-473
> Project: Pig
>  Issue Type: New Feature
>  Components: data, grunt
>Reporter: Earl Cahill
>Assignee: Earl Cahill
> Attachments: CommonLogLoader-PIG-473
>
>
> Want to be able to load files that are in Apache's common log format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-487) extract a host from a url

2009-11-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-487:
--

Assignee: Earl Cahill

> extract a host from a url
> -
>
> Key: PIG-487
> URL: https://issues.apache.org/jira/browse/PIG-487
> Project: Pig
>  Issue Type: New Feature
>Reporter: Earl Cahill
>Assignee: Earl Cahill
> Attachments: HostExtractor-PIG-487
>
>
> Want to be able to extract the host from a url.  For example,
> http://sports.espn.go.com/mlb/recap?gameId=281009122
> leads to
> sports.espn.go.com
> Pig latin usage looks like
> host = FOREACH row GENERATE 
> org.apache.pig.piggybank.evaluation.util.apachelogparser.HostExtractor(url);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   >