[jira] Updated: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-979: --- Fix Version/s: 0.6.0 Affects Version/s: 0.4.0 Status: Open (was: Patch Available) > Acummulator Interface for UDFs > -- > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Alan Gates >Assignee: Ying He > Fix For: 0.6.0 > > Attachments: PIG-979.patch, PIG-979.patch > > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-979: --- Status: Patch Available (was: Open) > Acummulator Interface for UDFs > -- > > Key: PIG-979 > URL: https://issues.apache.org/jira/browse/PIG-979 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Alan Gates >Assignee: Ying He > Fix For: 0.6.0 > > Attachments: PIG-979.patch, PIG-979.patch > > > Add an accumulator interface for UDFs that would allow them to take a set > number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1038) Optimize nested distinct/sort to use secondary key
[ https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1038: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) All javac warnings are deprecations. 1 release audit warning is fixed, remaining are not source code related. Also make minor changes to address Pradeep's comment. Patch committed. To disable secondary key optimization, use system property: pig.exec.nosecondarykey=true > Optimize nested distinct/sort to use secondary key > -- > > Key: PIG-1038 > URL: https://issues.apache.org/jira/browse/PIG-1038 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.4.0 >Reporter: Olga Natkovich >Assignee: Daniel Dai > Fix For: 0.6.0 > > Attachments: PIG-1038-1.patch, PIG-1038-2.patch, PIG-1038-3.patch, > PIG-1038-4.patch, PIG-1038-5.patch > > > If nested foreach plan contains sort/distinct, it is possible to use hadoop > secondary sort instead of SortedDataBag and DistinctDataBag to optimize the > query. > Eg1: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = order A by $1; > generate group, D; > } > store C into 'myresult'; > We can specify a secondary sort on A.$1, and drop "order A by $1". > Eg2: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = A.$1; > E = distinct D; > generate group, E; > } > store C into 'myresult'; > We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct > D" to a special version of distinct, which does not do the sorting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1038) Optimize nested distinct/sort to use secondary key
[ https://issues.apache.org/jira/browse/PIG-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776853#action_12776853 ] Pradeep Kamath commented on PIG-1038: - Changes look good. One observation is in SecondaryKeyOptimizer.java: {code} if (r) // if we saw physical operator other than project in sort // plan return; {code} should we be setting sawInvalidPhysicalOper? Other than that, +1 - please commit after making any change if required for the above. > Optimize nested distinct/sort to use secondary key > -- > > Key: PIG-1038 > URL: https://issues.apache.org/jira/browse/PIG-1038 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.4.0 >Reporter: Olga Natkovich >Assignee: Daniel Dai > Fix For: 0.6.0 > > Attachments: PIG-1038-1.patch, PIG-1038-2.patch, PIG-1038-3.patch, > PIG-1038-4.patch, PIG-1038-5.patch > > > If nested foreach plan contains sort/distinct, it is possible to use hadoop > secondary sort instead of SortedDataBag and DistinctDataBag to optimize the > query. > Eg1: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = order A by $1; > generate group, D; > } > store C into 'myresult'; > We can specify a secondary sort on A.$1, and drop "order A by $1". > Eg2: > A = load 'mydata'; > B = group A by $0; > C = foreach B { > D = A.$1; > E = distinct D; > generate group, E; > } > store C into 'myresult'; > We can specify a secondary sort key on A.$1, and simplify "D=A.$1; E=distinct > D" to a special version of distinct, which does not do the sorting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-6) Addition of Hbase Storage Option In Load/Store Statement
[ https://issues.apache.org/jira/browse/PIG-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-6: Assignee: Samuel Guo > Addition of Hbase Storage Option In Load/Store Statement > > > Key: PIG-6 > URL: https://issues.apache.org/jira/browse/PIG-6 > Project: Pig > Issue Type: New Feature > Environment: all environments >Reporter: Edward J. Yoon >Assignee: Samuel Guo > Fix For: 0.2.0 > > Attachments: hbase-0.18.1-test.jar, hbase-0.18.1.jar, m34813f5.txt, > PIG-6.patch, PIG-6_V01.patch > > > It needs to be able to load full table in hbase. (maybe ... difficult? i'm > not sure yet.) > Also, as described below, > It needs to compose an abstract 2d-table only with certain data filtered from > hbase array structure using arbitrary query-delimited. > {code} > A = LOAD table('hbase_table'); > or > B = LOAD table('hbase_table') Using HbaseQuery('Query-delimited by attributes > & timestamp') as (f1, f2[, f3]); > {code} > Once test is done on my local machines, > I will clarify the grammars and give you more examples to help you explain > more storage options. > Any advice welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-38) abstract PigScript parser
[ https://issues.apache.org/jira/browse/PIG-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-38: - Assignee: Christopher Olston > abstract PigScript parser > - > > Key: PIG-38 > URL: https://issues.apache.org/jira/browse/PIG-38 > Project: Pig > Issue Type: Improvement > Components: grunt > Environment: grunt and pigpen >Reporter: Christopher Olston >Assignee: Christopher Olston > Fix For: 0.1.0 > > Attachments: pigScriptParser.patch > > > I am developing Pig Pen, an Eclipse plugin for Pig. Pig Pen needs to parse > .pig scripts. The parsing is the same as for grunt, but the actions I take > are different (e.g., Pig Pen will ignore "store" commands for the purpose of > editing). > What I'd like to do is create an abstract class PigScriptParser, which is > identical to the current GruntParser except no actions are taken. Then I'll > add a GruntParser that extends PigScriptParser, and has concrete > implementations of actions (e.g., what to do when a "store" command is > encountered). > I'll also add a PigPenParser that also extends PigScriptParser. > This should not affect the behavior of GruntParser at all -- it just > separates the parsing from the actuating. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-20) Sorting using custom comparison functions
[ https://issues.apache.org/jira/browse/PIG-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-20: - Assignee: Olga Natkovich > Sorting using custom comparison functions > -- > > Key: PIG-20 > URL: https://issues.apache.org/jira/browse/PIG-20 > Project: Pig > Issue Type: New Feature > Components: impl >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Fix For: 0.1.0 > > Attachments: usercompare.patch > > > Currently, onlu string based sorting is supported. Once we have types, > numeric sort will be supported as well. However, soem users express need for > custome comparison functions for sort. > Alan put together a design document for this: > http://wiki.apache.org/pig/UserDefinedOrdering -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-12) Please add timestamps to pig map/reduce progress messages
[ https://issues.apache.org/jira/browse/PIG-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-12: - Assignee: Alan Gates > Please add timestamps to pig map/reduce progress messages > - > > Key: PIG-12 > URL: https://issues.apache.org/jira/browse/PIG-12 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Olga Natkovich >Assignee: Alan Gates > Fix For: 0.1.0 > > Attachments: timestamps.diff > > > From one of the users: > -- > I'm spending a lot of time trying to optimize my pig queries for short > run-times. This process would be much easier if, in the progress output > from pig (currently on stdout, but hopefully soon moving to > stderr?!), the > initiation and completion of each map/reduce job could be > timestamped. Pig > already spits out messages of the form "- MapReduce Job -", > "Input: > ...", "Combine: ...", etc; could you just add a "Timestamp: ..." > field as well?Or ideally, both "Starting timestamp: ..." and > "Finishing > timestamp ...". > Additional comments from another user: > -- > I'm adding my vote for this as well. > I'd like to know timestamp and "running time" in seconds or D;H:M:S: > Thu Oct 25 10:06:01 GMT 2007 (0:00:12:56): 56% done > Starting and stopping timestamps in the log would also be valuable. > Unforutately, there's no "workaround" such as putting a date command before > and after the pig command in logging -- > queuing times can be seconds to hours and completely mess up any notion of > job execution time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-13) need a way to find out what version of pig i'm using
[ https://issues.apache.org/jira/browse/PIG-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-13: - Assignee: Stefan Groschupf > need a way to find out what version of pig i'm using > > > Key: PIG-13 > URL: https://issues.apache.org/jira/browse/PIG-13 > Project: Pig > Issue Type: Improvement > Components: grunt >Reporter: Olga Natkovich >Assignee: Stefan Groschupf >Priority: Minor > Fix For: 0.1.0 > > Attachments: PIG-13-svnOptional_v_1_r633244.patch, PIG-13_v_1.patch > > > would be great if "pig -version" told me what version. > also, the text prior to "USAGE: ..." could also print the version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-51) Combiner gives wrong result in the presence of flattening
[ https://issues.apache.org/jira/browse/PIG-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-51: - Assignee: Utkarsh Srivastava > Combiner gives wrong result in the presence of flattening > - > > Key: PIG-51 > URL: https://issues.apache.org/jira/browse/PIG-51 > Project: Pig > Issue Type: Bug >Reporter: Utkarsh Srivastava >Assignee: Utkarsh Srivastava >Priority: Critical > Fix For: 0.1.0 > > Attachments: combiner-flatten.patch > > > If you do something like > a = load ... as (f1,f2,f3); > b = group a by (f1,f2); > c = foreach b generate flatten(group), SUM(a.f3); > The reduce side refers to field number expecting data will not have been > flattened yet. But if the combiner kicks in, it already flattens the group, > leading to column references being wrong. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-55) Allow user control over split creation
[ https://issues.apache.org/jira/browse/PIG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-55: - Assignee: Charlie Groves > Allow user control over split creation > -- > > Key: PIG-55 > URL: https://issues.apache.org/jira/browse/PIG-55 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.0.0 >Reporter: Charlie Groves >Assignee: Charlie Groves > Fix For: 0.1.0 > > Attachments: pig_chunker_split.patch, pig_chunker_split_v2.patch, > pig_chunker_split_v3.patch, pig_chunker_split_v4.patch, > pig_chunker_split_v5.patch, pig_chunker_split_v6.patch, > pig_chunker_split_v7.patch, replaceable_PigSplit.diff, > replaceable_PigSplit_v2.diff > > > I have a dataset in HDFS that's stored in a file per column that I'd like to > access from pig. This means I can't use LoadFunc to get at the data as it > only allows the loader access to a single input stream at a time. To handle > this usage, I've broken the existing split creation code out into a few > classes and interfaces, and allowed user specified load functions to be used > in place of the existing code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-58) parameterized Pig scripts
[ https://issues.apache.org/jira/browse/PIG-58?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-58: - Assignee: Olga Natkovich > parameterized Pig scripts > - > > Key: PIG-58 > URL: https://issues.apache.org/jira/browse/PIG-58 > Project: Pig > Issue Type: New Feature >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Fix For: 0.1.0 > > Attachments: PIG-58_v1.patch, PIG-58_v2, PIG-58_v3.patch > > > This feature has been requested by several users and would be very useful in > conjunction with streaming. The feature would allow pig script to include > parameters that are replaced at run time. For instance, if your script needs > to run on a daily basis over the data of the previous day, you would be able > to use the script and providing a date as a run-time parameter to it. > Example: > === > Pig script myscript.pig: > A = load '/data/mydata/%date%'; > B = filter A by $0>'5'; > . > Pig command line: > pig -param date='20080110' myscript.pig > Proposed interface and implementation: > Interface: > === > (0) Substitution will be only supported with pig script files. > (1) Parameters are specified on the command line via -param = > construct. Multiple parameters can be specified. They are applied to the > script in the order they are specified on the command line > (2) Default values for the parameters can be specified within the script via > decare statement: > decare = > (3) Withint the script the parameter will be enclosed in %%. \% can be used > te escape. > Implementation: > > Use preprocessor to do the substitution. The preprocessor would be invoced by > Main before grunt is instanciated and do the following: > - create a new file in temp location > - build a hash of parameters from command line and declare statement > - for each line in the original script > if this is a declare line, skip it > else for each unescaped pattern %% look for a match in the hash. > Replace, if found. Write the line to the temp file. > - pass the temp file to grunt. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-56) implement Iterable in DataBag
[ https://issues.apache.org/jira/browse/PIG-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-56: - Assignee: Charlie Groves > implement Iterable in DataBag > > > Key: PIG-56 > URL: https://issues.apache.org/jira/browse/PIG-56 > Project: Pig > Issue Type: Improvement >Reporter: Charlie Groves >Assignee: Charlie Groves >Priority: Minor > Fix For: 0.1.0 > > Attachments: iterable_databag.patch > > > Now that DataBag has an iterator method, it can implement Iterable with no > other changes. This would allow bags to be used in a foreach loop like > for(Tuple t : bag) { > // do something with t > } > The attached patch has DataBag implement iterable and converts all bag > iterator usages in pig to use foreach loops. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-59) A new "ILLUSTRATE" command which will help people debug their pig programs
[ https://issues.apache.org/jira/browse/PIG-59?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-59: - Assignee: Shubham Chopra > A new "ILLUSTRATE" command which will help people debug their pig programs > -- > > Key: PIG-59 > URL: https://issues.apache.org/jira/browse/PIG-59 > Project: Pig > Issue Type: New Feature > Components: grunt >Reporter: Shubham Chopra >Assignee: Shubham Chopra > Fix For: 0.1.0 > > Attachments: displayAlternate.patch, ExampleGenerator.patch, > ExampleGenerator.patch, ExampleGenerator.patch > > > I propose to add a new "ILLUSTRATE" command to Pig, which will help people > debug their Pig programs. > The idea is to select a few example data items, and illustrate how they are > transformed by the sequence of Pig commands in the user's program. I have an > algorithm that can select an appropriate and concise set of example data > items automatically. It does a better job than random sampling would do; for > example, random sampling suffers from the drawback that selective operations > such as filters or joins can eliminate *all* the sampled data items, giving > you empty results which is of no help in debugging. > This "ILLUSTRATE" functionality will avoid people having to test their Pig > programs on large data sets, which has a long turnaround time and wastes > system resources. > Proposed Implementation: > I will create a new package called org.apache.pig.exgen, which will contain > the aforementioned algorithm. The algorithm uses the "Local" execution > operators (it does not run on hadoop), so as to generate illustrative example > data in near-real-time for the user. > For my algorithm to work properly, it needs to trace the "lineage" (sometimes > called "provenance") of data items as they flow through the local operator > tree corresponding to the user's Pig program. So I will have to add a > "lineage tracer" to the Local operators, which maintains a side data > structure to represent the lineage, or derivation sequence, among data items. > The lineage tracer will be DISABLED BY DEFAULT, so it will not affect normal > Pig operation. > I will add a new method to PigServer called > "PigServer.showExamples(LogicalPlan)", which will cause my exgen algorithm to > be invoked. > I will also add a new command to Grunt, called ILLUSTRATE. Syntactically it > will work the same way as the STORE command. For example, a user might type: > grunt> visits = load 'visits.txt' as (user, url, timestamp); > grunt> recent_visits = filter visits by timestamp >= '20071201'; > grunt> user_visits = group recent_visits by user; > grunt> num_user_visits = foreach user_visits generate group, > COUNT(recent_visits); > grunt> illustrate num_user_visits > This would trigger my exgen algorithm, which will display something like: > visits: > (Amy, www.cnn.com, 20070218) > (Fred, www.harvard.edu, 20071204) > (Amy, www.bbc.com, 20071205) > (Fred, www.stanford.edu, 20071206) > recent_visits: > (Fred, www.harvard.edu, 20071204) > (Amy, www.bbc.com, 20071205) > (Fred, www.stanford.edu, 20071206) > user_visits: > (Fred, { (Fred, www.harvard.edu, 20071204), (Fred, www.stanford.edu, > 20071206) } ) > (Amy, { (Amy, www.bbc.com, 20071205) } ) > num_user_visits: > (Fred, 2) > (Amy, 1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-65) convert tabs to spaces
[ https://issues.apache.org/jira/browse/PIG-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-65: - Assignee: Charlie Groves > convert tabs to spaces > -- > > Key: PIG-65 > URL: https://issues.apache.org/jira/browse/PIG-65 > Project: Pig > Issue Type: Bug >Reporter: Charlie Groves >Assignee: Charlie Groves >Priority: Minor > Fix For: 0.1.0 > > Attachments: tabs_to_spaces.diff, tabs_to_spaces_post_PIG-32.diff > > > Many of the pig source files mix tabs and 4 spaces for indentation. This is > particularly painful for me when reading the code as I've set up my editor to > indent tabs 8 spaces so I can catch if I actually use them anywhere, and the > source jumps back and forth in indentation level, sometimes from line to line. > The patch replaces all tabs with 4 spaces in java code since that's what's > mentioned as the standard in the wiki. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-57) Occasional NullPointerException in PigContext.fixUpDomain method
[ https://issues.apache.org/jira/browse/PIG-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-57: - Assignee: Benjamin Francisoud > Occasional NullPointerException in PigContext.fixUpDomain method > > > Key: PIG-57 > URL: https://issues.apache.org/jira/browse/PIG-57 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Xu Zhang >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: PIG-57-v01.patch > > > I occasionally see the following NPE when running a Pig job with HOD: > 2008-01-08 06:14:24,558 [main] INFO org.apache.pig - Connecting to HOD... > 2008-01-08 06:14:29,732 [main] INFO org.apache.pig - HDFS Web UI: > nn-host:50070 > 2008-01-08 06:14:29,732 [main] INFO org.apache.pig - JobTracker Web UI: > jt-host:54597 > 2008-01-08 06:14:29,846 [main] FATAL org.apache.pig - Could not connect to HOD > java.lang.NullPointerException > at org.apache.pig.impl.PigContext.fixUpDomain(PigContext.java:350) > at org.apache.pig.impl.PigContext.doHod(PigContext.java:324) > at org.apache.pig.impl.PigContext.connect(PigContext.java:175) > at org.apache.pig.PigServer.(PigServer.java:128) > at org.apache.pig.tools.grunt.Grunt.(Grunt.java:37) > at org.apache.pig.Main.main(Main.java:212) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-68) Improving build.xml in many ways :)
[ https://issues.apache.org/jira/browse/PIG-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-68: - Assignee: Stefan Groschupf > Improving build.xml in many ways :) > --- > > Key: PIG-68 > URL: https://issues.apache.org/jira/browse/PIG-68 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Stefan Groschupf >Priority: Minor > Fix For: 0.1.0 > > Attachments: build.xml, build.xml-PIG-68-v01.patch, > build.xml-PIG-68-v02.patch, build.xml-PIG-68-v03.patch, > build.xml-PIG-68-v04.patch, build.xml-PIG-68-v05.patch, > build.xml-PIG-68-v06-SG.patch, build.xml-PIG-68-v07-SG.patch, > build.xml-PIG-68-v08-SG.patch, build.xml-PIG-68-v09-SG.patch, out > > > The build file can be improve in many ways: > * add revision number to pig.jar name (like: pig-r1234.jar) > * put pig.jar in the dist dir > * "clean" target leave a "depend" folder undeleted > * use a regexp to delete files in "org\apache\pig\impl\logicalLayer\parser" > folder instead of listing all files one by one that you want to delete > * put all artifacts (classes, jar, etc...) in the dist folder so that when > doing clean you just need to specify dist > * provide a description for targets (for "ant -projecthelp" command) > * use spaces or tabs but not both (spaces are better for patch and diff in my > opinion) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-69) NullPointerException in setJobtrackerLocation() in PigContext.java:68
[ https://issues.apache.org/jira/browse/PIG-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-69: - Assignee: Benjamin Francisoud > NullPointerException in setJobtrackerLocation() in PigContext.java:68 > - > > Key: PIG-69 > URL: https://issues.apache.org/jira/browse/PIG-69 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: PigContext-PIG-69-v01.patch, PigContext-PIG-69-v02.patch > > > {noformat} > java.lang.NullPointerException > at > org.apache.pig.impl.PigContext.setJobtrackerLocation(PigContext.java:425) > ... (the rest of the stacktrace is my own servlet code) > {noformat} > The code: > {code:java} > final PigContext pigContext = new PigContext(ExecType.MAPREDUCE); > pigContext.setJobtrackerLocation(configuration.get("mapred.job.tracker")); > pigContext.setFilesystemLocation(configuration.get("fs.default.name")); > > final PigServer pigServer = new PigServer(pigContext); > {code} > Where configuration is a org.apache.hadoop.conf.Configuration object > initialized with spring framework. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-80) Stacktrace information is lost at MapReduceLauncher.java:289
[ https://issues.apache.org/jira/browse/PIG-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-80: - Assignee: Benjamin Francisoud > Stacktrace information is lost at MapReduceLauncher.java:289 > > > Key: PIG-80 > URL: https://issues.apache.org/jira/browse/PIG-80 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud >Priority: Minor > Fix For: 0.1.0 > > Attachments: PIG-80-generics.patch, PIG-80-v01.patch, > PIG-80-v02.patch, PIG-80-v03.patch, PIG-80-v04.patch, PIG-80-v05.patch, > PIG-80-v06-unit-test-only.patch > > > {code:java} > ... > }catch (Exception e) { > // Do we need different handling for different exceptions > e.printStackTrace(); > throw new IOException(e.getMessage()); > }finally{ ... > {code} > in my case the sandard output is redirtected to /dev/null so > "e.printStackTrace();" is lost. > it should be : > {code:java}throw new IOException(e);{code} > no getMessage() because we loose the rest of the stacktrace -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-83) logging abstraction
[ https://issues.apache.org/jira/browse/PIG-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-83: - Assignee: Benjamin Francisoud > logging abstraction > --- > > Key: PIG-83 > URL: https://issues.apache.org/jira/browse/PIG-83 > Project: Pig > Issue Type: Wish >Reporter: Stefan Groschupf >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: log4j.properties, logging.properties, PIG-83-v01.patch, > PIG-83-v02.patch, PIG-83-v03.patch > > > Pig is logging quite a lot into System.out or System.err. Using a embedded > pig in a production environment requires a logging abstraction like log4j, > commons logging, slf4j or something like that. > I would be happy to work on a patch if we decide what would be the best > choice. Hadoop uses log4j. > Thanks. > Stefan -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-78) src/org/apache/pig/builtin/PigStorage.java doesn't compile
[ https://issues.apache.org/jira/browse/PIG-78?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-78: - Assignee: Arun C Murthy > src/org/apache/pig/builtin/PigStorage.java doesn't compile > -- > > Key: PIG-78 > URL: https://issues.apache.org/jira/browse/PIG-78 > Project: Pig > Issue Type: Bug >Reporter: Arun C Murthy >Assignee: Arun C Murthy > Fix For: 0.1.0 > > Attachments: PIG-78_0_20080125.patch > > > {noformat} > compile: > [echo] *** Building Main Sources *** > [javac] Compiling 6 source files to /Users/arunc/dev/java/pig/trunk/dist > [javac] > /Users/arunc/dev/java/pig/trunk/src/org/apache/pig/builtin/PigStorage.java:85: > cannot find symbol > [javac] symbol : method getBytes(java.nio.charset.Charset) > [javac] location: class java.lang.String > [javac] os.write((f.toDelimitedString(this.fieldDel) + > (char)this.recordDel).getBytes(utf8)); > [javac] ^ > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-77) Add eclipse file to ignore list
[ https://issues.apache.org/jira/browse/PIG-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-77: - Assignee: Benjamin Francisoud > Add eclipse file to ignore list > --- > > Key: PIG-77 > URL: https://issues.apache.org/jira/browse/PIG-77 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud >Priority: Minor > Fix For: 0.1.0 > > Attachments: PIG-77-v01.patch > > > I don't know if I'm the only one to use eclipse here but the .project, > .classpath and the folder .settings could be added to the svn:ingnore list. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-87) Improvements to pig.pl: make pigclient.conf optional; check JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-87: - Assignee: Craig Macdonald > Improvements to pig.pl: make pigclient.conf optional; check JAVA_HOME > -- > > Key: PIG-87 > URL: https://issues.apache.org/jira/browse/PIG-87 > Project: Pig > Issue Type: Improvement >Reporter: Craig Macdonald >Assignee: Craig Macdonald > Fix For: 0.1.0 > > Attachments: pig.pl.patch > > > Brief notes about the pig.pl, and a patch to resolve some of these > 1. Is conf/pigclient.conf really required? > pig.pl dies straight away if $ROOT/conf/pigclient.conf does not exist. > This is a shame, for a couple of reasons: > * the only really necessary detail in pigclient.conf is $pigJarRoot. > * $pigJarRoot, $hodRoot and $defaultCluster can be set using > pigclient.conf - why cant they also be: > (a) worked out from defaults ? - eg $pigJarRoot > my $JAR = $0;#/scripts/pig.pl > $JAR =~ s/pig\.pl/..\/pig.jar/ > (b) $hodRoot - seem an obvious example to be configurable using the > command line arguments? > (c) $defaultCluster - ditto? > * if conf/pigclient.conf doesnt exist, pig.pl dies before the --help options > can displayed (big shame) > -> means that scripts/pig.pl -h doesnt work out the box as well as most of > http://wiki.apache.org/pig/GettingStarted > * As far as I can see minimum setup for a new Pig user: > cd pig > (ant) > mkdir conf > echo "\$pigJarRoot = \"$PWD\"" > conf/pigclient.conf > mkdir -p libexec/pig//released/ > cp pig.jar libexec/pig//released/ > ROOT=$PWD scripts/pig.pl > or specify the class path manually. > 2. Java binary is looked for in a special Yahoo place or in PATH, but > JAVA_HOME is not checked, as per other common startup scripts (eg Tomcat). > 3. looking for java in the path > `which java 2>&1 > /dev/null`; > if ($? != 0) { > I cant help thinking that this would be better/quicker in Perl: > sub inpath > { > my $bin = shift; > foreach my $dir (split /:/, $ENV{PATH}) > { >return 1 if -e $dir/$bin; > } > return 0; > } > If this is deemed desirable, I can update the patch for this sub-issue too. > Please find attached a patch for pig.pl that resolves issues 1 & 2. This > will allow the GettingStarted documentation to perform as expected > without all the rigmarole associated with pigclient.conf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function
[ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-85: - Assignee: Pi Song > Unable to specify CTRL-A as a delimiter for the PigStorage function > --- > > Key: PIG-85 > URL: https://issues.apache.org/jira/browse/PIG-85 > Project: Pig > Issue Type: Bug >Reporter: Anand Murugappan >Assignee: Pi Song > Fix For: 0.1.0 > > Attachments: PIG-85_v4.patch, PIG_85_escaping_parameters.patch, > PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt > > > A PIG command like - > store abc into 'abc' using PigStorage('\x01'); > does not recognize hat the user is requesting the data to by ^A separated. > Instead the data that is stored is literally separated by the string '\x01'. > Neither does punching in ^A directly through the editor, nor do any other > strings like \u0001 help. > Using a ^A directly through the editor complains about it being an invalid > XML character and bails out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-84) Remove e.printStacktrace() from code
[ https://issues.apache.org/jira/browse/PIG-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-84: - Assignee: Benjamin Francisoud > Remove e.printStacktrace() from code > > > Key: PIG-84 > URL: https://issues.apache.org/jira/browse/PIG-84 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: PIG-84-v01.patch > > > From [Benjamin Reed in > PIG-80|https://issues.apache.org/jira/browse/PIG-80?focusedCommentId=12564097#action_12564097]: > "At the same time we should also remove all e.printStackTrace() calls." > I'll try to provide a patch for this... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-89) Too many spills to files causes ArrayIndexOutOfBoundsException if new temp file cant be created
[ https://issues.apache.org/jira/browse/PIG-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-89: - Assignee: Benjamin Francisoud > Too many spills to files causes ArrayIndexOutOfBoundsException if new temp > file cant be created > --- > > Key: PIG-89 > URL: https://issues.apache.org/jira/browse/PIG-89 > Project: Pig > Issue Type: Bug > Components: data > Environment: Linux, Local execution Mode, JDK 1.6 >Reporter: Craig Macdonald >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: databag-89-v3.patch, patch-v2.defaultdatabag, > patch.defaultdatabag > > > Hello, > I am experimenting, trying to perform a DISTINCT on a medium sized set of > URLs - about 3million (same set as I discussed previously - Utkarsh has a > copy), this time in local execution mode. > Pig script: > {{ > A = LOAD 'all_13122007.txt'; > B = DISTINCT A; > store B into 'bla; > }} > Bring these errors (two lines swapped in DefaultDatabag) to find real error. > {{ > 2008-02-04 18:09:44,756 [Low Memory Detector] INFO org.apache.pig - low > memory handler called init = 29491200(28800K) used = 269834064(263509K) > committed = 307036160(299840K) max = 471662592(460608K) > 2008-02-04 18:09:45,355 [Low Memory Detector] ERROR org.apache.pig - Unable > to spill contents to disk > java.io.IOException: Too many open files > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.checkAndCreate(File.java:1704) > at java.io.File.createTempFile(File.java:1793) > at java.io.File.createTempFile(File.java:1830) > at org.apache.pig.data.DataBag.getSpillFile(DataBag.java:367) > at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:69) > at > org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123) > at > sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138) > at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171) > at > sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300) > at sun.management.Sensor.trigger(Sensor.java:120) > java.lang.ArrayIndexOutOfBoundsException: -1 > at java.util.ArrayList.remove(ArrayList.java:390) > at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:84) > at > org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123) > at > sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138) > at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171) > at > sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300) > at sun.management.Sensor.trigger(Sensor.java:120) > Exception in thread "Low Memory Detector" java.lang.InternalError: Error in > invoking listener > at > sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:141) > at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171) > at > sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300) > at sun.management.Sensor.trigger(Sensor.java:120) > }} > There are a two sub-issues here: > 1. Pig spills too much using a default JVM (64MB) size - expected? > Perhaps pig.pl should set a default JVM size of more than 64MB? > 2. the line DefaultDataBag.java:84 > {{{ > mSpillFiles.remove(mSpillFiles.size() - 1); > }}} > line should check that mSpillFiles.size() > 0, because if > File.createTempFile( ) in Databag.getSpillFile() fails, the mSpillFiles will > not yet have been updated. My preference would be to split try{ } catch > (IOException ioe) { } within DefaultDatabag.spill() into two exception > handlers - one for getSpillFile() errors, and one for actual writing errors > (when we know mSpillFiles has been added to). > If this latter point isnt coherent, I can create patch. > Ta muchly. > C -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-91) outdated @Override tags
[ https://issues.apache.org/jira/browse/PIG-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-91: - Assignee: Johannes Zillmann > outdated @Override tags > --- > > Key: PIG-91 > URL: https://issues.apache.org/jira/browse/PIG-91 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Johannes Zillmann >Assignee: Johannes Zillmann > Fix For: 0.1.0 > > Attachments: pig-overirde.patch > > > There are a bunch of @Override tags which are not correct anymore (i guess > since PIG-32). > In my ide (eclipse) this results in compiling errors. > See for example > HDataStorage.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-90) PigServer#store does swallow exception
[ https://issues.apache.org/jira/browse/PIG-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-90: - Assignee: Benjamin Francisoud > PigServer#store does swallow exception > -- > > Key: PIG-90 > URL: https://issues.apache.org/jira/browse/PIG-90 > Project: Pig > Issue Type: Bug >Affects Versions: 0.0.0 >Reporter: Stefan Groschupf >Assignee: Benjamin Francisoud >Priority: Critical > Fix For: 0.1.0 > > Attachments: PIG-90-v01.patch > > > My custom DatabaseStoreFunction throws an (runtime or ioException) exception > in putNext. > Instead throwing this exception all the way up (the exceptions contains a > nice error message text) however in pigServer 326 a > java.lang.NoSuchMethodError: java.io.IOException: method > (Ljava/lang/String;Ljava/lang/Throwable;)V not found Exception will be > thrown. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-88) the project does not compile because of reference to HadoopExe class in Main.java
[ https://issues.apache.org/jira/browse/PIG-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-88: - Assignee: Pi Song > the project does not compile because of reference to HadoopExe class in > Main.java > - > > Key: PIG-88 > URL: https://issues.apache.org/jira/browse/PIG-88 > Project: Pig > Issue Type: Bug >Affects Versions: 0.1.0 > Environment: Win XP >Reporter: Pi Song >Assignee: Pi Song >Priority: Minor > Fix For: 0.1.0 > > Attachments: PIG-88.HadoopExe.patch > > Original Estimate: 0.08h > Remaining Estimate: 0.08h > > The project does not compile because of this line in Main.java > import org.apache.hadoop.util.HadoopExe; > From HADOOP-435, the patch to introduce this class has been canceled plus the > class itself is not being used at all in Main.java. > The simple patch removes that particular line and now the project compiles > successfully. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-92) PigContext NullPointerException because of uninitialize conf
[ https://issues.apache.org/jira/browse/PIG-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-92: - Assignee: Benjamin Francisoud > PigContext NullPointerException because of uninitialize conf > > > Key: PIG-92 > URL: https://issues.apache.org/jira/browse/PIG-92 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: PIG-92-v01.patch, PIG-92-v02.patch > > > This simple code throw an NPE > {code:java} > final PigContext pigContext = new PigContext(ExecType.MAPREDUCE); > pigContext.getConf().putAll(properties); > {code} > Because in PigContext.java: > {code:java} > transient private Properties conf = null; > public void connect() throws ExecException { > ... > conf = new Properties(); > > } > {code} > Simple patch: > {code:java} > transient private Properties conf = new Properties(); > public void connect() throws ExecException { > ... > } > {code} > This is regression already fix in PIG-69. > Introduce with PIG-32 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-95) pig should not use System.exit() since this would crash the application pig is embedded in.
[ https://issues.apache.org/jira/browse/PIG-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-95: - Assignee: Stefan Groschupf > pig should not use System.exit() since this would crash the application pig > is embedded in. > --- > > Key: PIG-95 > URL: https://issues.apache.org/jira/browse/PIG-95 > Project: Pig > Issue Type: Improvement >Reporter: Stefan Groschupf >Assignee: Stefan Groschupf >Priority: Critical > Fix For: 0.1.0 > > Attachments: 20080205-sg-noexit.patch > > > As discussed remove all System.exit statments and throw an exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-100) Tests: NullPointerException parser.QueryParser.Alias(QueryParser.java:471)
[ https://issues.apache.org/jira/browse/PIG-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-100: -- Assignee: Benjamin Francisoud > Tests: NullPointerException parser.QueryParser.Alias(QueryParser.java:471) > -- > > Key: PIG-100 > URL: https://issues.apache.org/jira/browse/PIG-100 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud >Priority: Minor > Fix For: 0.1.0 > > Attachments: PIG-100-tests.log, PIG-100-v01.patch, PIG-100-v02.patch, > PIG-100-v03-spaces.patch, PIG-100-v03-tabs.patch > > > I think the root problem was that I forget to specify the configuration using > -Djunit.hadoop.conf=hadoop-site.xml while running the tests. > But the error could be clearer... > The logs are big so I will provide them in a separate file... > But the core problem is: > {noformat} > [junit] java.lang.NullPointerException > [junit] at > org.apache.pig.impl.logicalLayer.parser.QueryParser.Alias(QueryParser.java:471) > [junit] at > org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedExpr(QueryParser.java:411) > [junit] at > org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedExpr(QueryParser.java:417) > [junit] at > org.apache.pig.impl.logicalLayer.parser.QueryParser.GroupItem(QueryParser.java:1027) > ... > [junit] org.apache.pig.impl.logicalLayer.parser.ParseException: > Encountered "group" at line 1, column 9. > [junit] Was expecting one of: > [junit] ... > [junit] "(" ... > [junit] > [junit] at > org.apache.pig.impl.logicalLayer.parser.QueryParser.generateParseException(QueryParser.java:4142) > ... > [junit] org.apache.pig.impl.logicalLayer.parser.ParseException: > Encountered "generate" at line 1, column 1. > [junit] Was expecting one of: > [junit] "load" ... > [junit] "filter" ... > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-98) grunt should show full exception stack
[ https://issues.apache.org/jira/browse/PIG-98?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-98: - Assignee: Stefan Groschupf > grunt should show full exception stack > -- > > Key: PIG-98 > URL: https://issues.apache.org/jira/browse/PIG-98 > Project: Pig > Issue Type: Improvement > Components: grunt >Reporter: Stefan Groschupf >Assignee: Stefan Groschupf >Priority: Minor > Attachments: showStackTrace-20080207.patch > > > I suggest grunt should be more helpful with user errors. I just did one (a > stupid one) and it took my too long to figure out the problem, since grunts > error message was just not giving me a good hint: > grunt> A = LOAD '/pigtestData.tsv' USING PigStorage(',') AS (user,age,cat); > grunt> B = FILTER A BY cat == 'book'; > grunt> dump B; > For input string: "book" > Experts will see that I tried to use == instead of eq, however especially new > users could get a little confused. > I see two chances add Error Number and descriptive texts (Oracle style) - > this quite a lot of work, or for now I suggest to simply dump the full > exception text. > At least for this early stage it would developers and users to find problems > faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-93) Impossible to set jobconf parameters
[ https://issues.apache.org/jira/browse/PIG-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-93: - Assignee: Benjamin Francisoud > Impossible to set jobconf parameters > > > Key: PIG-93 > URL: https://issues.apache.org/jira/browse/PIG-93 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud >Priority: Critical > Fix For: 0.1.0 > > Attachments: PIG93Main.java > > > I'm trying to set jobconf parameter before launching a pig job using pig api. > I tried 2 different ways but with no success: > {code:java} > PigContext pigContext = new PigContext(ExecType.MAPREDUCE); > pigContext.getExecutionEngine().getConfiguration().putAll(properties); > PigServer pigServer = new PigServer(pigContext); > > {code} > Throw a NPE because the internal executionEngine var is initialize only when > calling connect(). > So I tried: > {code:java} > PigContext pigContext = new PigContext(ExecType.MAPREDUCE); > pigContext.connect(); > pigContext.getExecutionEngine().getConfiguration().putAll(properties); > PigServer pigServer = new PigServer(pigContext); > ... > {code} > My properties have been replace with a "new JobConf()" > {noformat} > java.lang.RuntimeException: Bad mapred.job.tracker: local > at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:711) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:149) > at org.apache.pig.impl.PigContext.connect(PigContext.java:180) > {noformat} > "properties" contains "mapred.job.tracker" and "hadoop.tmp.dir values" > Before PIG-32 I use to do (and it was working): > {code:java} > PigContext pigContext = new PigContext(ExecType.MAPREDUCE); > pigContext.setConf(myJobConf); > PigServer pigServer = new PigServer(pigContext); > ... > {code} > Any idea before I start to work on a patch ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-101) Use ExecType.MAPREDUCE instead of duplicate string to initialize PigServer in tests
[ https://issues.apache.org/jira/browse/PIG-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-101: -- Assignee: Benjamin Francisoud > Use ExecType.MAPREDUCE instead of duplicate string to initialize PigServer in > tests > --- > > Key: PIG-101 > URL: https://issues.apache.org/jira/browse/PIG-101 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud >Priority: Trivial > Fix For: 0.1.0 > > Attachments: PIG-101-v01.patch, PIG-101-v02.patch > > > In the tests code, there are lots of: > {code:java} > private String initString = "mapreduce"; > @Test > public void testSomething() { > > PigServer pig = new PigServer(initString); > > } > {code} > It could be replace with > {code:java} > PigServer pig = new PigServer(ExecType.MAPREDUCE); > {code} > It would remove duplication in test. > Using a string makes the tests aware of the internal PigServer behavior. > It's really not a big deal hence the "trivial" :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-115) start script for pig
[ https://issues.apache.org/jira/browse/PIG-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-115: -- Assignee: Stefan Groschupf > start script for pig > > > Key: PIG-115 > URL: https://issues.apache.org/jira/browse/PIG-115 > Project: Pig > Issue Type: Improvement >Reporter: Stefan Groschupf >Assignee: Stefan Groschupf > Fix For: 0.1.0 > > Attachments: PIG-115_v_1.patch, PIG-115_v_2.patch, PIG-115_v_3.patch, > PIG-115_v_4_r634426.patch > > > The current pig.pl is very y! specific, a generic start script is required > that works for all users. > Goal of this issue is to collect a list requirements a new script has to > fulfill. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-109) improve exception handling for function instantiation
[ https://issues.apache.org/jira/browse/PIG-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-109: -- Assignee: Johannes Zillmann > improve exception handling for function instantiation > - > > Key: PIG-109 > URL: https://issues.apache.org/jira/browse/PIG-109 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Johannes Zillmann >Assignee: Johannes Zillmann > Fix For: 0.1.0 > > Attachments: PIG-109_620665.patch, pigExceptionPatch-627601.diff > > > Running pig on a cluster i got an instantiation exception for my custom > StoreFunc: > {noformat} > 08/02/13 22:58:42 ERROR mapreduceExec.MapReduceLauncher: Error message from > task (map) tip_200802110401_0072_m_00 java.lang.RuntimeException: > java.io.IOException: null > at org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:427) > at > org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:435) > at > org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:58) > at > org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:47) > at > org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.setupMapPipe(PigMapReduce.java:205) > at > org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigMapReduce.run(PigMapReduce.java:103) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) > {noformat} > Easy to figure out that there is a problem with my StoreFunc, but hard to > figure out what exactly. > Looking into the pig code up from PigContext#instantiateFunc() there is a > kind of exception handling which seems unecessary complicated. > Any exception which can happen while instantiating the store func (like > InstantiationException or InvocationTargetException) is catched and wrapped > with a IOException. > Later on the cause of the IOException is inspected (LOLoad, around line 60) > or wrapped into a RuntimeException without handing the causes over (PigSplit, > around line 101). > Since every exception which can raise on PigContext#instantiateFunc() is > rather an user error since a temporary environment problem, i think this > method can just throw an unchecked exception and don't have to declare > IOeception anymore. This should save a lot of trouble in calling methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-107) Some test methods are not run because there is no @Test annotation
[ https://issues.apache.org/jira/browse/PIG-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-107: -- Assignee: Benjamin Francisoud > Some test methods are not run because there is no @Test annotation > -- > > Key: PIG-107 > URL: https://issues.apache.org/jira/browse/PIG-107 > Project: Pig > Issue Type: Test >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: PIG-107-v01.patch > > > I don't know if that's on purpose but in TestLogicalPlanBuilder.java, those > methods don't have the @Test annotation and therefore are not run with latest > junit (in my case in eclipse): > {code:java} > public void testQuery41() {} > public void testQuery42() {} > public void testQuery43() {} > public void testQuery44() {} > public void testQueryFail44() throws Throwable {} > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-113) Make Grunt's explain output more understandable
[ https://issues.apache.org/jira/browse/PIG-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-113: -- Assignee: Pi Song > Make Grunt's explain output more understandable > --- > > Key: PIG-113 > URL: https://issues.apache.org/jira/browse/PIG-113 > Project: Pig > Issue Type: Improvement > Components: grunt >Affects Versions: 0.1.0 >Reporter: Pi Song >Assignee: Pi Song >Priority: Minor > Fix For: 0.1.0 > > Attachments: pig_printtree_1.patch, pig_printtree_2.patch > > > I think it would be better if we can display the execution plan in a more > understandable way. One intuitive way to do this is to show output as a tree > like in SQL Server. > Possibly we can have 'AS ' as optional argument for explain command > For example > {noformat} > Grunt> explain bag1 AS tree ; > Grunt> explain bag1 AS xml ; > {noformat} > and > {noformat} > Grunt> explain bag1 > {noformat} > will display the default format > I have included a patch that does generate tree output. > Here is a sample of the existing output format > {noformat} > Logical Plan: > Group root-Sun Feb 17 19:37:07 GMT+10:00 2008-5 > Object id: 9814147 > Inputs: 26335425 > Schema: (group, (sum, (), (), ())) > EvalSpecs: > Generate: has 2 children > Project: (0) > Star > Split root-Sun Feb 17 19:37:07 GMT+10:00 2008-2 > Object id: 25199001 > Inputs: 29132923 > Schema: (sum, (), (), ()) > EvalSpecs: > Eval root-Sun Feb 17 19:37:07 GMT+10:00 2008-1 > Object id: 29132923 > Inputs: 10774273 > Schema: (sum, (), (), ()) > EvalSpecs: > Generate: has 4 children > FuncEval: name: org.apache.pig.impl.builtin.ADD args: > Generate: has 2 children > Project: (0) > Project: (1) > Project: (0) > Project: (1) > Project: (2) > Load root-Sun Feb 17 19:37:07 GMT+10:00 2008-0 > Object id: 10774273 > Inputs: > Schema: () > EvalSpecs: > --- > Physical Plan: > MAPREDUCE > Object id: 17671659 > Inputs: 682933706 > Map: > Star > Grouping Funcs: > Generate: has 2 children > Project: (0) > Star > Input Files: /tmp/temp678140026/tmp1867058340 > MAPREDUCE > Object id: 17308974 > Inputs: > Map: > Composite: has 2 children > Star > Generate: has 4 children > FuncEval: name: org.apache.pig.impl.builtin.ADD args: > Generate: has 2 children > Project: (0) > Project: (1) > Project: (0) > Project: (1) > Project: (2) > Input Files: /tmp/data1.txt > Output File: /tmp/temp678140026/tmp1613817084 > {noformat} > Here is a sample of my tree output which is more compact and more > understandable :- > {noformat} > grunt> explain c1 as tree ; > Logical Plan: > |---LOCogroup ( GENERATE {[PROJECT $0],[*]} ) > |---LOSplitOutput ( ) > |---LOSplit ( ([PROJECT $0] < ['5']),([PROJECT $0] >= ['5']) ) > |---LOEval ( GENERATE > {[org.apache.pig.impl.builtin.ADD(GENERATE {[PROJECT $0],[PROJECT > $1]})],[PROJECT $0],[PROJECT $1],[PROJECT $2]} ) > |---LOLoad ( file = /tmp/data1.txt ) > --- > Physical Plan: > |---POMapreduce > Map : * > Grouping : Generate(Project(0),*) > Input File(s) : /tmp/temp678140026/tmp1867058340 > |---POMapreduce > Map : > Composite(*,Generate(FuncEval(org.apache.pig.impl.builtin.ADD(Generate(Project(0),Project(1,Project(0),Project(1),Project(2))) > Input File(s) : /tmp/data1.txt > {noformat} > I'm also thinking about doing output as xml as it might benefit people who > are working on displaying execution plan on GUI. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-108) PigCombine does not use configure method and therefore de-serialize and instantiate objects with every reduce call
[ https://issues.apache.org/jira/browse/PIG-108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-108: -- Assignee: Stefan Groschupf > PigCombine does not use configure method and therefore de-serialize and > instantiate objects with every reduce call > -- > > Key: PIG-108 > URL: https://issues.apache.org/jira/browse/PIG-108 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.1.0 >Reporter: Stefan Groschupf >Assignee: Stefan Groschupf >Priority: Critical > Fix For: 0.1.0 > > Attachments: PIG-108-r639015-v1.patch > > > There some significant space for improvement in the PigCombine. > In each reduce call some objects are deserialized from the jobConf and also > the object graph is generated again and again. > Hadoop garanties to call the configure method before a run through and things > like inputCount can be than cached as fields. > During reduce calls the jobConf will not change so re deserialization and > instantiation of all this objects > pigContext, evalPipe, inputCount, oc, finalout, esp and so on and so on, > makes no sense from my point of view. > Not sure how often the PigCombine is used, but it will significant improve > performance if we fix this. > Was there any reason to do things like this or is that just historical? > As soon the test suite is running again, I would be happy to work on a patch > if there is no other options about that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-118) UNION/CROSS/JOIN operations should not allow 1 operand
[ https://issues.apache.org/jira/browse/PIG-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-118: -- Assignee: Pi Song > UNION/CROSS/JOIN operations should not allow 1 operand > -- > > Key: PIG-118 > URL: https://issues.apache.org/jira/browse/PIG-118 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.0.0 >Reporter: Pi Song >Assignee: Pi Song > Fix For: 0.1.0 > > Attachments: pig_1operand.patch > > > At the moment UNION/CROSS/JOIN allow 1 operand. > You can write:- > {noformat} > b = UNION a ; > c = CROSS b ; > d = JOIN c BY $0 ; > {noformat} > Possibly UNION with 1 operand might be needed for implementing Sigma-styled > union (Ui=1..n An) but for CROSS/JOIN I think nobody would do such operation. > By simply replacing "*" with "+" in the parser tree should fix this problem. > Should this be fixed? > {noformat} > LogicalOperator CrossClause() : {LogicalOperator op; ArrayList > inputs = new ArrayList();} > { > ( > op = NestedExpr() { inputs.add(op.getOperatorKey()); } > ("," op = NestedExpr() { inputs.add(op.getOperatorKey()); })* > ) > {return rewriteCross(inputs);} > } > LogicalOperator JoinClause() : {CogroupInput gi; ArrayList gis > = new ArrayList();} > { > (gi = GroupItem() { gis.add(gi); } > ("," gi = GroupItem() { gis.add(gi); })*) > {return rewriteJoin(gis);} > } > LogicalOperator UnionClause() : {LogicalOperator op; ArrayList > inputs = new ArrayList();} > { > (op = NestedExpr() { inputs.add(op.getOperatorKey()); } > ("," op = NestedExpr() { inputs.add(op.getOperatorKey()); })*) > {return new LOUnion(opTable, scope, getNextId(), inputs);} > } > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-124) only run one test (aant runtest -Dtest=TestMapReduce) not the complete test suite
[ https://issues.apache.org/jira/browse/PIG-124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-124: -- Assignee: Stefan Groschupf > only run one test (aant runtest -Dtest=TestMapReduce) not the complete test > suite > - > > Key: PIG-124 > URL: https://issues.apache.org/jira/browse/PIG-124 > Project: Pig > Issue Type: Improvement >Reporter: Stefan Groschupf >Assignee: Stefan Groschupf > Fix For: 0.1.0 > > Attachments: PIG-124_v_1.patch, RunIndividualTestCase.patch > > > +1 to what Xu is saying. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-127) Add descriptions to ant target
[ https://issues.apache.org/jira/browse/PIG-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-127: -- Assignee: Benjamin Francisoud > Add descriptions to ant target > -- > > Key: PIG-127 > URL: https://issues.apache.org/jira/browse/PIG-127 > Project: Pig > Issue Type: Improvement > Components: tools >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud >Priority: Minor > Fix For: 0.1.0 > > Attachments: PIG-127-v01.patch > > > In PIG-68, I used the "description" attribute to provide help when doing "ant > -projecthelp" > It seems the last patch commited lost those informations. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-125) improve exception handling and expressivness around tuple field access
[ https://issues.apache.org/jira/browse/PIG-125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-125: -- Assignee: Johannes Zillmann > improve exception handling and expressivness around tuple field access > -- > > Key: PIG-125 > URL: https://issues.apache.org/jira/browse/PIG-125 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Johannes Zillmann >Assignee: Johannes Zillmann > Fix For: 0.1.0 > > Attachments: PIG-125.patch > > > Stumbled over the case that i'm accessing fields in a tuple which type are > not as i expected. The stack trace in one case looked as follow: > {noformat} > Exception in thread "main" java.lang.RuntimeException: execution failed > at com.my.Executor.run(Executor.java:284) > Caused by: java.io.IOException: Unable to store alias C > at > org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16) > at org.apache.pig.PigServer.store(PigServer.java:335) > at org.apache.pig.PigServer.store(PigServer.java:317) > at com.my.Executor.run(Executor.java:280) > ... 2 more > Caused by: org.apache.pig.backend.executionengine.ExecException > at > org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:137) > at > org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:32) > at org.apache.pig.PigServer.store(PigServer.java:332) > ... 4 more > Caused by: java.io.IOException: Incompatible type for request getAtomField(). > at org.apache.pig.data.Tuple.getAtomField(Tuple.java:177) > at com.my.DatabaseStoreFunc.putNext(DatabaseStoreFunc.java:83) > at org.apache.pig.impl.io.PigFile.store(PigFile.java:64) > at > org.apache.pig.backend.local.executionengine.POStore.getNext(POStore.java:105) > at > org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:130) > ... 6 more > {noformat} > The exception message and the stacktrace gave me a clue what kind of problem > i was facing. But to know what exactly happened i needed to debug (or > temporarily add some system-outs). > Looking at the code (of Tuple class) i think the exception-information can be > improved easily (add index and actual field type information) . > Also it seems that there is some space for simplifying the exception handling. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-122) remove TokenMgrError and Co from svn properties in src/org/apache/pig/tools/pigscript/parser
[ https://issues.apache.org/jira/browse/PIG-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-122: -- Assignee: Benjamin Francisoud > remove TokenMgrError and Co from svn properties in > src/org/apache/pig/tools/pigscript/parser > > > Key: PIG-122 > URL: https://issues.apache.org/jira/browse/PIG-122 > Project: Pig > Issue Type: Improvement >Reporter: Stefan Groschupf >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: PIG-122-v02.patch, PIG-122-v03.patch, PIG-122_v_1.patch > > > This is obsolete now and also will help people using the new build.xml to > recognize they need to delete this files. > Also we should add src-gen to the svn:ignore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-120) support hadoop map reduce in loal mode
[ https://issues.apache.org/jira/browse/PIG-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-120: -- Assignee: Stefan Groschupf > support hadoop map reduce in loal mode > -- > > Key: PIG-120 > URL: https://issues.apache.org/jira/browse/PIG-120 > Project: Pig > Issue Type: Bug >Reporter: Stefan Groschupf >Assignee: Stefan Groschupf > Fix For: 0.1.0 > > Attachments: PIG-120_v_1.patch > > > Currently pig support mapreduce and local as execution modes. > LocalExecutionEngine is used for local and HExecutionEngine for map reduce. > HExecutionEngine always expect that hadoop runs as cluster with a name node > and jobtracker listing on a port. > Though, hadoop can also run in a local mode (LocalJobRunner) this would give > several advantages. > First it would speed up the test suite significant. Second it would be > possible to debug map reduce plans easily. > For example we was able to debug and reproduce PIG-110 with this method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-139) Command line editing, history and more for Grunt
[ https://issues.apache.org/jira/browse/PIG-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-139: -- Assignee: Daniel Dai > Command line editing, history and more for Grunt > > > Key: PIG-139 > URL: https://issues.apache.org/jira/browse/PIG-139 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 > Environment: Grunt >Reporter: Amir Youssefi >Assignee: Daniel Dai > Fix For: 0.2.0 > > Attachments: jline-0.9.94.jar, jline.patch, jline2.patch, > jline3.patch, jline4.patch > > > We need to add support of command line editing, history and more for Grunt. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-137) test instantiation of StoreFunc in LOStore swallows (cause) exceptions
[ https://issues.apache.org/jira/browse/PIG-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-137: -- Assignee: Johannes Zillmann > test instantiation of StoreFunc in LOStore swallows (cause) exceptions > -- > > Key: PIG-137 > URL: https://issues.apache.org/jira/browse/PIG-137 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Johannes Zillmann >Assignee: Johannes Zillmann > Attachments: PIG-137-633746.patch > > > The current handling > {noformat} > IOException ioe = new IOException(e.getMessage()); > ioe.setStackTrace(e.getStackTrace()); > throw ioe; > {noformat} > passes the exception message and the stacktrace of the exception, but not the > stacktraces of the exceptions wich caused the exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-155) logo improvement
[ https://issues.apache.org/jira/browse/PIG-155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-155: -- Assignee: Stefan Groschupf > logo improvement > > > Key: PIG-155 > URL: https://issues.apache.org/jira/browse/PIG-155 > Project: Pig > Issue Type: Improvement >Reporter: Stefan Groschupf >Assignee: Stefan Groschupf >Priority: Trivial > Attachments: 080224_logo_pig_01_rgb.jpg, pig_logo_improvement.zip > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-134) Update Java version requirement on deployment page
[ https://issues.apache.org/jira/browse/PIG-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-134: -- Assignee: Benjamin Francisoud > Update Java version requirement on deployment page > -- > > Key: PIG-134 > URL: https://issues.apache.org/jira/browse/PIG-134 > Project: Pig > Issue Type: Task > Components: documentation >Affects Versions: 0.1.0 >Reporter: Benjamin Francisoud >Assignee: Benjamin Francisoud > Fix For: 0.1.0 > > Attachments: PIG-134-v01.patch > > > In http://incubator.apache.org/pig/deployment.html, this line is outdated: > {quote} > Requirements >1. Java *1.6.x.* preferably from Sun. Set JAVA_HOME to the root of your > Java installation. > {quote} > It is *1.5.x* > I will provide the patch for deployment.xml, but I think you need to > regenerate the forrest documentation (html and pdf). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-178) Use of schema on a secondary output of SPLIT throws IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/PIG-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-178: -- Assignee: Mathieu Poumeyrol > Use of schema on a secondary output of SPLIT throws IndexOutOfBoundsException > - > > Key: PIG-178 > URL: https://issues.apache.org/jira/browse/PIG-178 > Project: Pig > Issue Type: Bug > Components: impl > Environment: not relevant >Reporter: Mathieu Poumeyrol >Assignee: Mathieu Poumeyrol > Fix For: 0.1.0 > > Attachments: PigSplit.patch, TestPigSplit.patch > > > outputSchema for LOSplitOutput is trivialy broken. patch including testcase > and fix are coming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-171) Top K
[ https://issues.apache.org/jira/browse/PIG-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-171: -- Assignee: Daniel Dai > Top K > - > > Key: PIG-171 > URL: https://issues.apache.org/jira/browse/PIG-171 > Project: Pig > Issue Type: Sub-task >Affects Versions: 0.2.0 >Reporter: Amir Youssefi >Assignee: Daniel Dai > Fix For: 0.2.0 > > Attachments: limit1.patch, limit2.patch, limit3.patch > > > Frequently, users are interested on Top results (especially Top K rows) . > This can be implemented efficiently in Pig /Map Reduce settings to deliver > rapid results and low Network Bandwidth/Memory usage. > > Key point is to prune all data on the map side and keep only small set of > rows with Top criteria . We can do it in Algebraic function (combiner) with > multiple value output. Only a small data-set gets out of mapper node. > The same idea is applicable to solve variants of this problem: > - An Algebraic Function for 'Top K Rows' > - An Algebraic Function for 'Top K' values ('Top Rank K' and 'Top Dense > Rank K') > - TOP K ORDER BY. > Another words implementation is similar to combiners for aggregate functions > but instead of one value we get multiple ones. > I will add a sample implementation for Top K Rows and possibly TOP K ORDER BY > to clarify details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-203) pig parser hangs on input script bigger ~1kb
[ https://issues.apache.org/jira/browse/PIG-203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-203: -- Assignee: Mathieu Poumeyrol > pig parser hangs on input script bigger ~1kb > > > Key: PIG-203 > URL: https://issues.apache.org/jira/browse/PIG-203 > Project: Pig > Issue Type: Bug >Reporter: Mathieu Poumeyrol >Assignee: Mathieu Poumeyrol > Fix For: 0.1.0 > > Attachments: Main.patch > > > When the command line interpreter is run on a file bigger than 1kb or so, it > overflows the PipeReader/PipeWriter internal buffers and freezes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-201) BufferedPositionedInputStream is not buffered
[ https://issues.apache.org/jira/browse/PIG-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-201: -- Assignee: Mathieu Poumeyrol > BufferedPositionedInputStream is not buffered > - > > Key: PIG-201 > URL: https://issues.apache.org/jira/browse/PIG-201 > Project: Pig > Issue Type: Bug >Reporter: Mathieu Poumeyrol >Assignee: Mathieu Poumeyrol > Attachments: BufferedPositionedInputStream.patch > > > BufferedPositionedInputStream is actualy not buffered, leading (I guess) to > constant round trip to dfs as byte are read one by one. I just wrapped the > provided input stream in the constructor in a good old BufferedInputStream. > I measured a 40% performance boost on a script that reads and writes 3.7GB in > dfs through PigStorage on one node. I guess the impact may be greater on a > real hdfs cluster with actual network roundtrips. > FYI, the issue was found while profiling with Yourkit java profiler. Usefull > toy... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-200) Pig Performance Benchmarks
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-200: -- Assignee: Alan Gates > Pig Performance Benchmarks > -- > > Key: PIG-200 > URL: https://issues.apache.org/jira/browse/PIG-200 > Project: Pig > Issue Type: Task >Reporter: Amir Youssefi >Assignee: Alan Gates > Attachments: generate_data.pl, perf.hadoop.patch, perf.patch > > > To benchmark Pig performance, we need to have a TPC-H like Large Data Set > plus Script Collection. This is used in comparison of different Pig releases, > Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only). > Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance > I am currently running long-running Pig scripts over data-sets in the order > of tens of TBs. Next step is hundreds of TBs. > We need to have an open large-data set (open source scripts which generate > data-set) and detailed scripts for important operations such as ORDER, > AGGREGATION etc. > We can call those the Pig Workouts: Cardio (short processing), Marathon (long > running scripts) and Triathlon (Mix). > I will update this JIRA with more details of current activities soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured
[ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-202: -- Assignee: Mathieu Poumeyrol > ComparatorFunc provided to ORDER clause is not always honoured > -- > > Key: PIG-202 > URL: https://issues.apache.org/jira/browse/PIG-202 > Project: Pig > Issue Type: Bug >Reporter: Mathieu Poumeyrol >Assignee: Mathieu Poumeyrol > Fix For: 0.1.0 > > Attachments: EvalSpec.patch, InstantiateFunc.patch, > MapreducePlanCompiler.patch, quantiles.in, quantiles.pig, Sort.patch, > Sort.v2.patch, TestOderBy.patch > > > Specifying a comparator function is acknowledge neither by local > implementation, nor by quartile lookup job. > Patch coming soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-207) New illustrate command does not work in mapreduce mode.
[ https://issues.apache.org/jira/browse/PIG-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-207: -- Assignee: Shubham Chopra > New illustrate command does not work in mapreduce mode. > --- > > Key: PIG-207 > URL: https://issues.apache.org/jira/browse/PIG-207 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.1.0 >Reporter: Alan Gates >Assignee: Shubham Chopra >Priority: Minor > Fix For: 0.1.0 > > Attachments: exgen.patch > > > In local mode, illustrate will work. But if exectype is set to mapreduce, > then: > {noformat} > grunt> a = load 'data/test.txt'; > grunt> b = filter a by $0 eq 'f2'; > grunt> illustrate b; > 2008-04-16 00:03:06,512 [main] ERROR org.apache.pig.tools.grunt.GruntParser - > java.lang.ClassCastException: > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine cannot be cast > to org.apache.pig.backend.local.executionengine.LocalExecutionEngine > at org.apache.pig.pen.ExGen.GenerateExamples(ExGen.java:61) > at org.apache.pig.PigServer.showExamples(PigServer.java:573) > at org.apache.pig.PigServer.showExamples(PigServer.java:569) > at > org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:131) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:172) > at > org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:72) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54) > at org.apache.pig.Main.main(Main.java:272) > {noformat} > dump a and dump b work. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-213) Non-static Log objects in org.apache.pig.data.* classes are inefficient
[ https://issues.apache.org/jira/browse/PIG-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-213: -- Assignee: Vadim Geshel > Non-static Log objects in org.apache.pig.data.* classes are inefficient > --- > > Key: PIG-213 > URL: https://issues.apache.org/jira/browse/PIG-213 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.1.0 >Reporter: Vadim Geshel >Assignee: Vadim Geshel >Priority: Minor > Fix For: 0.1.0 > > Attachments: logging.patch > > > LogFactory.getLog called from the constructor of Tuple accounts for > significant percentage of my job's running time. The proposed fix is to make > the Log fields static (which is generally standard practice). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-215) Miscellaneous cleanups after PIG-111 Configuration
[ https://issues.apache.org/jira/browse/PIG-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-215: -- Assignee: Pi Song > Miscellaneous cleanups after PIG-111 Configuration > -- > > Key: PIG-215 > URL: https://issues.apache.org/jira/browse/PIG-215 > Project: Pig > Issue Type: Bug >Reporter: Pi Song >Assignee: Pi Song >Priority: Minor > Fix For: 0.1.0 > > Attachments: CleanPig111.patch, CleanPig111_2.patch > > > - Set default execution mode to MapReduce (This is a surprise as it should > have been fixed even before PIG-111 got checked-in) > - When using local hadoop, changed message "Connecting to HDFS at null" to > "Connecting to HDFS at local" > - Added the missing conf/log4j.properties > - Removed some dead code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-234) fix synchronization around staleCount in DataCollector
[ https://issues.apache.org/jira/browse/PIG-234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-234: -- Assignee: Chad Whipkey > fix synchronization around staleCount in DataCollector > -- > > Key: PIG-234 > URL: https://issues.apache.org/jira/browse/PIG-234 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Chad Whipkey >Assignee: Chad Whipkey >Priority: Minor > Fix For: 0.1.0 > > Attachments: Change_synchronization_on_DataCollector.patch > > > DataCollector uses synchronized statements on staleCount, but the staleCount > reference changes! I'm proposing it switch to use the concurrent package > Lock and condition to manage staleness. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-219) Pig tests must cover local and mapreduce execution types
[ https://issues.apache.org/jira/browse/PIG-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-219: -- Assignee: Mathieu Poumeyrol > Pig tests must cover local and mapreduce execution types > > > Key: PIG-219 > URL: https://issues.apache.org/jira/browse/PIG-219 > Project: Pig > Issue Type: Bug >Reporter: Mathieu Poumeyrol >Assignee: Mathieu Poumeyrol > Fix For: 0.1.0 > > Attachments: Test.all.v1.patch, Test.v1.patch > > > Followup of Local and MapReduce Test modes in pig-dev. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-243) Make pig work on Windows
[ https://issues.apache.org/jira/browse/PIG-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-243: -- Assignee: Daniel Dai > Make pig work on Windows > > > Key: PIG-243 > URL: https://issues.apache.org/jira/browse/PIG-243 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Daniel Dai > Fix For: 0.1.0 > > Attachments: cygpath.patch, PIG_243.patch > > > Currently a large number of unit tests is failing on Windows. We need to fix > that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1064) Behvaiour of COGROUP with and without schema when using "*" operator
[ https://issues.apache.org/jira/browse/PIG-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776832#action_12776832 ] Hadoop QA commented on PIG-1064: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12424676/PIG-1064.patch against trunk revision 835005. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/149/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/149/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/149/console This message is automatically generated. > Behvaiour of COGROUP with and without schema when using "*" operator > > > Key: PIG-1064 > URL: https://issues.apache.org/jira/browse/PIG-1064 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.6.0 >Reporter: Viraj Bhat >Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: PIG-1064.patch > > > I have 2 tab separated files, "1.txt" and "2.txt" > $ cat 1.txt > > 1 2 > 2 3 > > $ cat 2.txt > 1 2 > 2 3 > I use COGROUP feature of Pig in the following way: > $java -cp pig.jar:$HADOOP_HOME org.apache.pig.Main > {code} > grunt> A = load '1.txt'; > grunt> B = load '2.txt' as (b0, b1); > grunt> C = cogroup A by *, B by *; > {code} > 2009-10-29 12:46:04,150 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1012: Each COGroup input has to have the same number of inner plans > Details at logfile: pig_1256845224752.log > == > If I reverse, the order of the schema's > {code} > grunt> A = load '1.txt' as (a0, a1); > grunt> B = load '2.txt'; > grunt> C = cogroup A by *, B by *; > {code} > 2009-10-29 12:49:27,869 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1013: Grouping attributes can either be star (*) or a list of expressions, > but not both. > Details at logfile: pig_1256845224752.log > == > Now running without schema?? > {code} > grunt> A = load '1.txt'; > grunt> B = load '2.txt'; > grunt> C = cogroup A by *, B by *; > grunt> dump C; > {code} > 2009-10-29 12:55:37,202 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully > stored result in: "file:/tmp/temp-319926700/tmp-1990275961" > 2009-10-29 12:55:37,202 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records > written : 2 > 2009-10-29 12:55:37,202 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written > : 154 > 2009-10-29 12:55:37,202 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! > 2009-10-29 12:55:37,202 [main] INFO > org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! > ((1,2),{(1,2)},{(1,2)}) > ((2,3),{(2,3)},{(2,3)}) > == > Is this a bug or a feature? > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-250) Pig is broken with speculative execution
[ https://issues.apache.org/jira/browse/PIG-250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-250: -- Assignee: Olga Natkovich > Pig is broken with speculative execution > > > Key: PIG-250 > URL: https://issues.apache.org/jira/browse/PIG-250 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Fix For: 0.1.0 > > Attachments: PIG-250.patch, PIG-250_v2.patch > > > If I have speculative execution turned on, the following script fails: > a = load 'studenttab20m' as (name, age, gpa); > b = load 'votertab10k' as (name, age, registration, contributions); > c = filter a by age < '50'; > d = filter b by age < '50'; > e = cogroup c by (name, age), d by (name, age) parallel 10; > f = foreach e generate flatten(c), flatten(d) parallel 10; > g = group f by registration parallel 10; > h = foreach g generate group, SUM(f.d::contributions) parallel 10; > i = order h by ($1, $0); > store i into 'out'; > I traced this to the fact that the first MR job produces one or more empty > outputs from the reducer. This happened on the reducers that happened to have > second task running. > I am not sure what the issue is and I am working with hadoop guys to > investigate. Until this issue is resolved, I would like to trun speculative > execution off. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-256) support non default constructor with variable number of arguments
[ https://issues.apache.org/jira/browse/PIG-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-256: -- Assignee: Pi Song > support non default constructor with variable number of arguments > - > > Key: PIG-256 > URL: https://issues.apache.org/jira/browse/PIG-256 > Project: Pig > Issue Type: Improvement >Reporter: Ajay Garg >Assignee: Pi Song >Priority: Minor > Fix For: 0.1.0 > > Attachments: PIG_256_vararg_instantiation.patch > > > pig does not support non default constructor with variable number of > arguments support. In our case we need this because the number of variables > that are specified by the user are varying. The fix is simple. Pig calls > getConstr("agr1","arg2",...,"argn") and if it doesn't find it throws a > noSuchMethodFound exception. In the catch block we just need to add code to > check if we can wrap the arg1..n in a String[] and check if a constructor can > be found with this signature getConstr(args[]). This would resolve the > variable num args issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-255) Calling non default constructor of Final class from Main class in UDF
[ https://issues.apache.org/jira/browse/PIG-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-255: -- Assignee: Ajay Garg > Calling non default constructor of Final class from Main class in UDF > - > > Key: PIG-255 > URL: https://issues.apache.org/jira/browse/PIG-255 > Project: Pig > Issue Type: Improvement >Reporter: Ajay Garg >Assignee: Ajay Garg >Priority: Minor > Fix For: 0.1.0 > > Attachments: cons.patch, new.patch, test.patch > > > Pig supports the use of define to call a non default constructor. Making it > work across Algebraic functions is not possible with the current code. The > problem is once the func is defined to use a non default constructor which > takes in names of the variables, we have no way of transmitting this > information from the main class to the final class. We tried passing the func > spec through the call to getFinal(). That is, What ever names we get in the > main class we store it and when the getFinal method is called, instead of > just passing the name of the Final class we attach the string args received > by the main class to the name to construct a func spec. For ex. if define COV > = Covariance('Population', 'Height'); Then we would have the "Population' & > 'Height' stored in the main class. A call to getFinal would return > Covariance$Final("Population", "Height") instead of just Covariance$Final. I > guess this is the right way to go. However, pig has a problem with this. The > resolveClassName method doesn't think of its args as specs and assumes them > to be just names. So in createJar, when the func spec, > Covariance$Final("Population", "Height") is being resolved it fails. I think > this is an issue with pig and we need to resolve it by clipping the args > before doing a resolveClassName. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-258) Pig should cleanup output directory of a failed query
[ https://issues.apache.org/jira/browse/PIG-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-258: -- Assignee: Daniel Dai > Pig should cleanup output directory of a failed query > - > > Key: PIG-258 > URL: https://issues.apache.org/jira/browse/PIG-258 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Daniel Dai >Priority: Minor > Attachments: clearoutput.patch, clearoutput2.patch > > > Currently, after a failed store, the output directory is left behind and > can't be re-used without manual cleanup -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-270) Show Line Number in Pig Error Messages
[ https://issues.apache.org/jira/browse/PIG-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-270: -- Assignee: Daniel Dai > Show Line Number in Pig Error Messages > -- > > Key: PIG-270 > URL: https://issues.apache.org/jira/browse/PIG-270 > Project: Pig > Issue Type: Improvement >Reporter: Amir Youssefi >Assignee: Daniel Dai > Attachments: linenum.patch > > > It will be a great help to users to show A) Line Number B) Actual Line in Pig > Error Messages. Currently user has to copy/paste script line by line in Grunt > to find out line that ran into a problem. For Grunt we can skip line number. > Alternatively, we can assign line numbers in Grunt and show it in command > prompt alongside "grunt>". This could be a separate issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-277) UDF for computing correlation and covariance between data sets
[ https://issues.apache.org/jira/browse/PIG-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-277: -- Assignee: Ajay Garg > UDF for computing correlation and covariance between data sets > -- > > Key: PIG-277 > URL: https://issues.apache.org/jira/browse/PIG-277 > Project: Pig > Issue Type: New Feature >Reporter: Ajay Garg >Assignee: Ajay Garg >Priority: Minor > Fix For: 0.1.0 > > Attachments: newStats.patch, stat.patch > > > UDFs for computing correlation and covariance between data sets. Use > following commands to compute covariance > A = load 'input.xml' using PigStorage(':'); > B = group A all; > define c COV('a','b','c'); > D = foreach B generate group,c(A.$0,A.$1,A.$2); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-284) target for building source jar
[ https://issues.apache.org/jira/browse/PIG-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-284: -- Assignee: Johannes Zillmann > target for building source jar > -- > > Key: PIG-284 > URL: https://issues.apache.org/jira/browse/PIG-284 > Project: Pig > Issue Type: Wish > Components: tools >Reporter: Johannes Zillmann >Assignee: Johannes Zillmann >Priority: Minor > Fix For: 0.1.0 > > Attachments: Pig-284-v1.patch > > > It would be a great help, if pig's build.xml would be capable of building a > source jar. > The source jar could i.e. be used by eclipse and thus provides better > debugging support, original parameter names, etc... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-288) Null pointer exception with load as schema - Optimizer
[ https://issues.apache.org/jira/browse/PIG-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-288: -- Assignee: Pi Song > Null pointer exception with load as schema - Optimizer > -- > > Key: PIG-288 > URL: https://issues.apache.org/jira/browse/PIG-288 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.2.0 >Reporter: Santhosh Srinivasan >Assignee: Pi Song > Attachments: PIG_288_OptimizerNPE.patch, PIG_288_OptimizerNPE_2.patch > > > A new test case (testNestedPlan) added to TestEvalPipeline has the following > query: > pig.registerQuery("A = LOAD 'file:" + tmpFile + "'as (a:int, > b:int);"); > pig.registerQuery("B = group A by $0;"); > + "C1 = filter A by $0 > -1;" > + "C2 = distinct C1;" > + "C3 = distinct A;" > + "generate (int)group;" > + "};"; > Testcase: testNestedPlan took 0.913 sec > Caused an ERROR > Unable to open iterator for alias: C > java.io.IOException: Unable to open iterator for alias: C > at > org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34) > at org.apache.pig.PigServer.openIterator(PigServer.java:268) > at > org.apache.pig.test.TestEvalPipeline.testNestedPlan(TestEvalPipeline.java:376) > Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: Unable to > insert type casts into plan > at > org.apache.pig.impl.logicalLayer.optimizer.TypeCastInserter.transform(TypeCastInserter.java:144) > at > org.apache.pig.impl.plan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:63) > at org.apache.pig.PigServer.compileLp(PigServer.java:551) > at org.apache.pig.PigServer.execute(PigServer.java:477) > at org.apache.pig.PigServer.openIterator(PigServer.java:259) > ... 16 more > Caused by: java.lang.NullPointerException > at org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:121) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:65) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:273) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:37) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:57) > at > org.apache.pig.impl.logicalLayer.optimizer.TypeCastInserter.transform(TypeCastInserter.java:141) > ... 20 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-278) Allow no alias in Dot schema definition in Dot LogicalPlanLoader
[ https://issues.apache.org/jira/browse/PIG-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-278: -- Assignee: Pi Song > Allow no alias in Dot schema definition in Dot LogicalPlanLoader > > > Key: PIG-278 > URL: https://issues.apache.org/jira/browse/PIG-278 > Project: Pig > Issue Type: Bug >Reporter: Pi Song >Assignee: Pi Song > Attachments: AllowNoAliasSchemaInDot.patch, > AllowNoAliasSchemaInDot2.patch > > > Our schema parser doesn't allow "null" alias but we have to be able to do > that in Dot test files. > This is a work around by introducing "[NoAlias]" keyword in schema definition > just for Dot LogicalPlanLoader. > Sample:- > {noformat} > foreach [ key="20", type="LOForEach" , schema="[NoAlias] : long, [NoAlias] : > byteArray"] ; > {noformat} > At runtime, [NoAlias] will be substituted by dummy column names before being > sent to the parser. Subsequently those names will be replaced by "null". > There is no changes in the actual query parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-298) Wrong sort logic in POSort
[ https://issues.apache.org/jira/browse/PIG-298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-298: -- Assignee: Pi Song > Wrong sort logic in POSort > -- > > Key: PIG-298 > URL: https://issues.apache.org/jira/browse/PIG-298 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pi Song >Assignee: Pi Song > Attachments: Wrong_Sort_logic_in_POSort.patch > > > This might relate to PIG-292. > The current logic is obviously wrong as it only returns the comparison return > of the last comparison only!!. > Patch + tests attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-291) hod.param parameters not passed properly
[ https://issues.apache.org/jira/browse/PIG-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-291: -- Assignee: Ian Atha > hod.param parameters not passed properly > > > Key: PIG-291 > URL: https://issues.apache.org/jira/browse/PIG-291 > Project: Pig > Issue Type: Bug > Components: impl > Environment: Linux hostname 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 > EDT 2007 x86_64 x86_64 x86_64 GNU/Linux > Apache Pig version 0.1.0-dev (r8087) > Hadoop 0.17.1 Subversion > http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 669344 > hod --version: 0.17.1 >Reporter: Ian Atha >Assignee: Ian Atha > Attachments: pig-291.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > pig -Dhod.param='-N hodclustername' script.pig > fails with the following error: > 2008-07-03 17:53:18,236 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to HOD... > org.apache.pig.backend.executionengine.ExecException: Could not connect to HOD > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.doHod(HExecutionEngine.java:428) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:121) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:108) > at org.apache.pig.impl.PigContext.connect(PigContext.java:177) > at org.apache.pig.PigServer.(PigServer.java:149) > at org.apache.pig.tools.grunt.Grunt.(Grunt.java:43) > at org.apache.pig.Main.main(Main.java:293) > Caused by: org.apache.pig.backend.executionengine.ExecException: > org.apache.pig.backend.executionengine.ExecException: Failed to run command > hod allocate -d /tmp/PigHod.hostname.thatha.304309240344558 -n 15 -N > hodclustername on server local; return code: 4; error: CRITICAL - qsub > Failure : qsub: illegal -N value > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.runCommand(HExecutionEngine.java:541) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.doHod(HExecutionEngine.java:373) > ... 6 more > Caused by: org.apache.pig.backend.executionengine.ExecException: Failed to > run command hod allocate -d /tmp/PigHod.hostname.thatha.304309240344558 -n 15 > -N hodclustername on server local; return code: 4; error: CRITICAL - qsub > Failure : qsub: illegal -N value > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.runCommand(HExecutionEngine.java:538) > ... 7 more > It appears that the problem is in the parsing of hod.param, located in > org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java, in > doHod(...). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-293) order by * goes into infinite loop
[ https://issues.apache.org/jira/browse/PIG-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-293: -- Assignee: Santhosh Srinivasan > order by * goes into infinite loop > -- > > Key: PIG-293 > URL: https://issues.apache.org/jira/browse/PIG-293 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Alan Gates >Assignee: Santhosh Srinivasan > Fix For: 0.2.0 > > Attachments: sort_star_with_project.patch > > > Scripts with order by * go into an infinite loop. Worse yet, they appear to > be reporting progress to hadoop in this loop, and thus are never terminated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-297) RM a non-existing file should not fail the script
[ https://issues.apache.org/jira/browse/PIG-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-297: -- Assignee: Yiping Han > RM a non-existing file should not fail the script > - > > Key: PIG-297 > URL: https://issues.apache.org/jira/browse/PIG-297 > Project: Pig > Issue Type: Improvement > Components: grunt >Reporter: Yiping Han >Assignee: Yiping Han >Priority: Minor > Attachments: PIG-297.patch, PIG-297v2.patch > > > rm is commonly used to remove the existing output before re-execute a script. > However, when the output is not existing, rm will fail and grunt will > terminate the execution. Such a behavior is very inconvenience. Expected > grunt behavior would print some error message and continue to execute the > script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-315) Issue with cast in foreach
[ https://issues.apache.org/jira/browse/PIG-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-315: -- Assignee: Pi Song > Issue with cast in foreach > -- > > Key: PIG-315 > URL: https://issues.apache.org/jira/browse/PIG-315 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Pi Song > Fix For: 0.2.0 > > Attachments: PIG315.patch > > > Query which causes error: > {code} > a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, > gpa:double); > b = foreach a generate (long)age as age, (int)gpa as gpa; > c = foreach b generate SUM(age), SUM(gpa); > store c into ':OUTPATH:';\, > {code} > Error: > {quote} > 2008-07-14 16:34:42,130 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: mytesthost:8020 > 2008-07-14 16:34:42,187 [main] WARN org.apache.hadoop.fs.FileSystem - > "mytesthost:8020" is a deprecated filesystem name. Use > "hdfs://mytesthost:8020/" instead. > 2008-07-14 16:34:42,441 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: mytesthost:50020 > 2008-07-14 16:34:42,696 [main] WARN org.apache.hadoop.fs.FileSystem - > "mytesthost:8020" is a deprecated filesystem name. Use > "hdfs://mytesthost:8020/" instead. > 2008-07-14 16:34:43,006 [main] ERROR org.apache.pig.PigServer - Problem > resolving LOForEach schema > 2008-07-14 16:34:43,006 [main] ERROR org.apache.pig.PigServer - Severe > problem found during validation > org.apache.pig.impl.plan.PlanValidationException: An unexpected exception > caused the validation to stop > 2008-07-14 16:34:43,007 [main] ERROR org.apache.pig.tools.grunt.Grunt - > java.io.IOException: Unable to store for alias: c > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-300) Minor Changes to SliceWrapper for Generic Hadoop InputFormat
[ https://issues.apache.org/jira/browse/PIG-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-300: -- Assignee: Christian Kunz > Minor Changes to SliceWrapper for Generic Hadoop InputFormat > > > Key: PIG-300 > URL: https://issues.apache.org/jira/browse/PIG-300 > Project: Pig > Issue Type: Improvement > Environment: trunk >Reporter: Christian Kunz >Assignee: Christian Kunz > Fix For: 0.2.0 > > Attachments: PIG-300.patch > > > I am working on a Load Function that allows to specify any Hadoop > FileInputFormat or CompositeInputFormat. > Because of the nature of PigSlice and PigSlicer such a UDF needs to use a > different implementation of Slice and Slicer. > It turns out that it would be extremely helpful if the SliceWrapper class had > a couple of minor changes: > 1) an additional get method to return the 'wrapped' slice. > 2) change to getLocations method to just call the getLocations() method of > the wrapped Slice, unless 'wrapped' is a PigSlice (in which case it just does > what it does now). > I will make a patch available shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-308) Flatten is not being set to true in joins
[ https://issues.apache.org/jira/browse/PIG-308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-308: -- Assignee: Alan Gates > Flatten is not being set to true in joins > - > > Key: PIG-308 > URL: https://issues.apache.org/jira/browse/PIG-308 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.2.0 >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.2.0 > > Attachments: join.patch > > > Queries that use the JOIN keyword are returning incorrect results because the > flatten values are not being set to true for the foreach that is put after > the cogroup. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-339) Limit follow cross/union return wrong number of records
[ https://issues.apache.org/jira/browse/PIG-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-339: -- Assignee: Daniel Dai > Limit follow cross/union return wrong number of records > --- > > Key: PIG-339 > URL: https://issues.apache.org/jira/browse/PIG-339 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.2.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.2.0 > > Attachments: PIG-339.patch > > > The following script returns double records as expected: > a = load 'a'; > b = load 'b'; > c = union a, b; > d = cross a, b; > e = limit c 100; > f = limit d 100; > dump e; // return double number of records > dump f;// return double number of records > Seems to be the limit operator in reduce plan is not effective. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-321) Incorrect results from arithmetic expression
[ https://issues.apache.org/jira/browse/PIG-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-321: -- Assignee: Pi Song > Incorrect results from arithmetic expression > > > Key: PIG-321 > URL: https://issues.apache.org/jira/browse/PIG-321 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Pi Song > Fix For: 0.2.0 > > Attachments: Pig321_parser.patch > > > Query: > {code} > a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name:chararray, > age:int, gpa:double); > b = foreach a generate 1 + 0.2f + 253645L, gpa+1; > > store b into '/tmp/arithtest'; > > {code} > Results > 25365.2 2.9 > 25365.2 4.65 > ... > The first projection above has 253645 as a Long constant. The results have > 25365.2 which is an order less -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-337) If limit size exceeds number of records in the file, a few records get dropped
[ https://issues.apache.org/jira/browse/PIG-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-337: -- Assignee: Daniel Dai > If limit size exceeds number of records in the file, a few records get dropped > -- > > Key: PIG-337 > URL: https://issues.apache.org/jira/browse/PIG-337 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.2.0 >Reporter: Alan Gates >Assignee: Daniel Dai > Fix For: 0.2.0 > > Attachments: PIG-337.patch > > > Given a file with 10k records, the following script returned 9996 records: > a = load 'studenttab10k'; > b = limit a 10; > dump b; > It looks like maybe the limit operator isn't returning its last record or > something. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-338) limit return uncorrect records following distinct
[ https://issues.apache.org/jira/browse/PIG-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-338: -- Assignee: Daniel Dai > limit return uncorrect records following distinct > - > > Key: PIG-338 > URL: https://issues.apache.org/jira/browse/PIG-338 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.2.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.2.0 > > Attachments: PIG-338-2.patch, PIG-338.patch, > TEST-org.apache.pig.test.TestLogicalOptimizer.txt > > > The following script return fewer records than expected: > a = load 'f'; > b = distinct a; > c = limit b 10; > dump c; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-319) Union, Cross is not working
[ https://issues.apache.org/jira/browse/PIG-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-319: -- Assignee: Pi Song > Union, Cross is not working > --- > > Key: PIG-319 > URL: https://issues.apache.org/jira/browse/PIG-319 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.2.0 >Reporter: Daniel Dai >Assignee: Pi Song > Fix For: 0.2.0 > > Attachments: fix_union.patch > > > union and cross operator is not working in branches/types. For example: > a = load 'a'; > b = load 'b'; > c = union a, b; > d = cross a, b; > dump c; // fail > dump d; // fail > Error message: " Attempt to give operator of type > org.apache.pig.impl.physicalLayer.relationalOperators.POLoad multiple inputs. > This operator does not support multiple inputs." -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-368) User defined Loader functions need a way to get jobconf without going through Slicer
[ https://issues.apache.org/jira/browse/PIG-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-368: -- Assignee: Pradeep Kamath > User defined Loader functions need a way to get jobconf without going through > Slicer > > > Key: PIG-368 > URL: https://issues.apache.org/jira/browse/PIG-368 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.2.0 > > Attachments: PIG-368.patch > > > Some user defined loader functions in the current pig release (without types) > need the JobConf to build the appropriate RecordReader. Currently they do > this in a round about way by using the Slicer. The jobConf should be > available from PigInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-352) java.lang.ClassCastException when invalid field is accessed
[ https://issues.apache.org/jira/browse/PIG-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-352: -- Assignee: Santhosh Srinivasan > java.lang.ClassCastException when invalid field is accessed > --- > > Key: PIG-352 > URL: https://issues.apache.org/jira/browse/PIG-352 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Olga Natkovich >Assignee: Santhosh Srinivasan > Fix For: 0.2.0 > > Attachments: out_of_bound_schema_access.patch > > > grunt> A = load 'foo' as (a, b, c); > grunt> B = foreach A generate $5; > 2008-07-31 16:25:13,847 [main] ERROR org.apache.pig.tools.grunt.GruntParser - > java.lang.ClassCastException: > org.apache.pig.impl.logicalLayer.FrontendException > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:454) > at > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60) > at org.apache.pig.PigServer.registerQuery(PigServer.java:248) > at > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:425) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) > at > org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:92) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58) > at org.apache.pig.Main.main(Main.java:278) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-342) Size of DistinctDataBag is calculated incorrectly if spill occurs and non-distinct elements are inserted
[ https://issues.apache.org/jira/browse/PIG-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-342: -- Assignee: Brandon Dimcheff > Size of DistinctDataBag is calculated incorrectly if spill occurs and > non-distinct elements are inserted > > > Key: PIG-342 > URL: https://issues.apache.org/jira/browse/PIG-342 > Project: Pig > Issue Type: Bug >Affects Versions: 0.1.0 >Reporter: Brandon Dimcheff >Assignee: Brandon Dimcheff > Fix For: 0.1.0 > > Attachments: size.patch > > > If a spill occurs while elements are being inserted into a DistinctDataBag, > it's possible that non-unique items will be added to the in-memory data > structure, and the mSize counter will be incremented. If the same elements > also exist on disk, the count will be higher than it should be. > The following is copied from an email exchange I had with Alan Gates: > Alan, > Thanks for your help. I've done a bit more experimentation and have > discovered a couple more things. I first looked at how COUNT was > implemented. It looks like COUNT calls size() on the bag, which will return > mSize. I thought that mSize might be calculated improperly so I added > "SUM(unique_ids) AS crazy_userid_sum" to my GENERATE line and re-ran the > pigfile: > GENERATE FLATTEN(group), SUM(nice_data.duration) AS total_duration, > COUNT(nice_data) AS channel_switches, COUNT(unique_ids) AS unique_users, > SUM(unique_ids) AS crazy_userid_sum; > It turns out that the SUM generates the correct result in all cases, while > there are still occasional errors in the COUNT. Since SUM requires an > iteration over all the elements in the DistinctDataBag, this led me to > believe that the uniqueness constraint is indeed operating correctly, but > there is some error in the logic that calculates mSize. > Then I started poking around in DistinctDataBag looking for anything that > changes mSize that might be incorrect. I noticed that on line 87 in > addAll(), the size of the DataBag that is passed into the method is added to > the mSize instance variable, and then during the iteration a few lines later > mSize is being incremented when an element is successfully added to > mContents. I thought this might be the problem, since it seems like elements > would be double counted if addAll() was called. I commented out line 87, > recompiled Pig, and ran it again, but there are still errors (though I do > think line 87 might be incorrect anyways). > Thanks to my coworker Marshall, I think we may have discovered what the > actual problem is. The scenario is as follows: We're adding a bunch of > stuff to the bag, and before we're finished a spill occurs. mContents is > cleared during the spill (line 157). All add() does is check uniqueness > against mContents. So now we will get duplicates in mContents that are > already on disk and an inflated mSize. Now, the reason why SUM works is > because the iterator is smart and enforces uniqueness as it reads the records > back in. We think this occurs at the beginning of addToQueue, around line 363 > - 369. mMergeTree is a TreeSet, so it'll enforce uniqueness and the call to > addToQueue is aborted if there's already a matching record in mMergeTree. > Do you think our assessment is correct? If so, it seems that the calculation > of mSize needs to be significantly more complex than it is now. It looks to > me like the entire bag will need to be iterated in order to reliably > calculate the size. Do you have any ideas about how to implement this in a > less expensive way? I'd be happy to take a stab at it, but I don't want to > do anything particularly silly if you have a better idea. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-367) provide default schema name
[ https://issues.apache.org/jira/browse/PIG-367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-367: -- Assignee: Olga Natkovich > provide default schema name > --- > > Key: PIG-367 > URL: https://issues.apache.org/jira/browse/PIG-367 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich >Assignee: Olga Natkovich > Fix For: 0.2.0 > > Attachments: PIG-367.patch > > > This is just to help UDFs to name their ouput -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-428) TypeCastInserter does not replace projects in inner plans correctly
[ https://issues.apache.org/jira/browse/PIG-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-428: -- Assignee: Pradeep Kamath > TypeCastInserter does not replace projects in inner plans correctly > --- > > Key: PIG-428 > URL: https://issues.apache.org/jira/browse/PIG-428 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.2.0 > > Attachments: PIG-428.patch > > > The TypeCastInserter tries to replace the Project's input operator in inner > plans with the new foreach operator it adds. However it should replace only > those Projects' input where the new Foreach has been added after the operator > which was earlier the input to Project. > Here is a query which fails due to this: > {code} > a = load 'st10k' as (name:chararray,age:int, gpa:double); > another = load 'st10k'; > c = foreach another generate $0, $1+ 10, $2 + 10; > d = join a by $0, c by $0; > dump d; > {code} > Here is the error: > {noformat} > 2008-09-11 23:34:28,169 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error > message from task (map) tip_200809051428_0045_m_00java.io.IOException: > Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, > recieved org.apache.pig.impl.io.NullableBytesWritable > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:419) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:83) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-429) Self join wth implicit split has the join output in wrong order
[ https://issues.apache.org/jira/browse/PIG-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-429: -- Assignee: Pradeep Kamath > Self join wth implicit split has the join output in wrong order > --- > > Key: PIG-429 > URL: https://issues.apache.org/jira/browse/PIG-429 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.2.0 > > Attachments: PIG-429.patch > > > Query: > {code} > A = load 'st10k' split by 'file'; > B = filter A by $1 > 25; > D = join A by $0, B by $0; > dump D; > {code} > In the output the columns from B are projected out first and from A next. On > closer examination of the code, the ImplicitSplitInserter class adds in the > split and two splitoutput operators into the plan and tries the connect the > successors of LOad to these. However it does this by iterating over its > successors and disconnecting from them and connecting up the > split-splitoutput to the successors. However the order in which it gets its > successors is NOT the same as the order in which cogroup (join) expects its > inputs. Hence the discrepancy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-413) TestBuiltin has an error in testSumFinal
[ https://issues.apache.org/jira/browse/PIG-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-413: -- Assignee: Pradeep Kamath > TestBuiltin has an error in testSumFinal > - > > Key: PIG-413 > URL: https://issues.apache.org/jira/browse/PIG-413 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.2.0 > > Attachments: PIG-413.patch > > > Here's the error: > {noformat} > Testcase: testSUMFinal took 0.005 sec > Caused an ERROR > Caught exception in IntSum.Final [java.lang.Integer] > java.io.IOException: Caught exception in IntSum.Final [java.lang.Integer] > at org.apache.pig.builtin.IntSum$Final.exec(IntSum.java:90) > at org.apache.pig.builtin.IntSum$Final.exec(IntSum.java:71) > at org.apache.pig.test.TestBuiltin.testSUMFinal(TestBuiltin.java:436) > Caused by: java.lang.ClassCastException: java.lang.Integer > ... 18 more > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-439) Currently we do not support A=B; correctly - for now capture this case and produce a meaningful message
[ https://issues.apache.org/jira/browse/PIG-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-439: -- Assignee: Pradeep Kamath > Currently we do not support A=B; correctly - for now capture this case and > produce a meaningful message > --- > > Key: PIG-439 > URL: https://issues.apache.org/jira/browse/PIG-439 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.2.0 > > Attachments: PIG-439.patch > > > Currently we do not support A=B; correctly - for now capture this case and > produce a meaningful message - A separate JIRA-438 has been created to fix > the main issue -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-452) Issues when non existent columns are projected
[ https://issues.apache.org/jira/browse/PIG-452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-452: -- Assignee: Alan Gates > Issues when non existent columns are projected > -- > > Key: PIG-452 > URL: https://issues.apache.org/jira/browse/PIG-452 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Alan Gates > Fix For: 0.2.0 > > Attachments: PIG-452.patch > > > Script: > {code} > -- columns x,y,z do not exist > a = load 'st10k' as (name, age, gpa, x, y, z); > b = load 'st10k' as (name, age:chararray, gpa); > c = join a by (name, y), b by (name, age); > dump c; > {code} > Error: > {noformat} > 2008-09-23 14:22:20,237 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Job failed! > 2008-09-23 14:22:20,253 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error > message from task (map) tip_200809051428_0112_m_00java.io.IOException: > Received Error while processing the map plan. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:79) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) > 2008-09-23 14:22:20,253 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error > message from task (map) tip_200809051428_0112_m_00java.io.IOException: > Received Error while processing the map plan. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:79) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) > 2008-09-23 14:22:20,253 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error > message from task (map) tip_200809051428_0112_m_00java.io.IOException: > Received Error while processing the map plan. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:79) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) > 2008-09-23 14:22:20,259 [main] ERROR > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error > message from task (map) tip_200809051428_0112_m_00java.io.IOException: > Received Error while processing the map plan. > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:197) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:79) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) > java.io.IOException: Unable to open iterator for alias: c [Job terminated > with anomalous status FAILED] > at org.apache.pig.PigServer.openIterator(PigServer.java:384) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:268) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:176) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:83) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) > at org.apache.pig.Main.main(Main.java:306) > Caused by: java.io.IOException: Job terminated with anomalous status FAILED > ... 6
[jira] Assigned: (PIG-431) When the specified load function cannot be found the error message is totally incomprehensible.
[ https://issues.apache.org/jira/browse/PIG-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-431: -- Assignee: Pradeep Kamath > When the specified load function cannot be found the error message is totally > incomprehensible. > --- > > Key: PIG-431 > URL: https://issues.apache.org/jira/browse/PIG-431 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.2.0 >Reporter: Alan Gates >Assignee: Pradeep Kamath > Fix For: 0.2.0 > > > "a = load ':INPATH:/singlefile/studenttab10k' using NoSuchFunction(':'); > In Pig 1.x the resulting error message was: > Could not resolve NoSuchFunction > In 2.0 instead the user gets > java.lang.ClassCastException: java.io.IOException > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1104) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:869) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:728) > at > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:529) > at > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60) > at org.apache.pig.PigServer.parseQuery(PigServer.java:290) > at org.apache.pig.PigServer.registerQuery(PigServer.java:258) > at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:432) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:242) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:83) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) > at org.apache.pig.Main.main(Main.java:306) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-434) AND and OR do not give right results with nulls
[ https://issues.apache.org/jira/browse/PIG-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-434: -- Assignee: Pradeep Kamath > AND and OR do not give right results with nulls > --- > > Key: PIG-434 > URL: https://issues.apache.org/jira/browse/PIG-434 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.2.0 > > Attachments: PIG-434.patch > > > Here are the truth tables for AND and OR - currently we do not short circuit > and return a null if either operand is null (for both AND and OR) > {noformat} > truth table for AND > t = true, n = null, f = false > AND t n f > tt n f > n n n f > ff f f > truth table for OR > t = true, n = null, f = false > OR t n f > tt t t > nt n n > f t n f > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
[ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-476: -- Assignee: Earl Cahill > given a date that can match a SimpleDateFormat want to be able to extract > arbitrary SimpleDateFormat data, like day or year > --- > > Key: PIG-476 > URL: https://issues.apache.org/jira/browse/PIG-476 > Project: Pig > Issue Type: New Feature >Reporter: Earl Cahill >Assignee: Earl Cahill > Attachments: DateExtractor-PIG-476 > > > Want to be able to do something like > A = FOREACH raw GENERATE > org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, > "", "dd/MMM/:HH:mm:ss"); > to extract the year, or if your date is formatted as > dd/MMM/:HH:mm:ss Z > you could do something like > A = FOREACH raw GENERATE > org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, > "MM-dd-"); > to grab out the day -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-474) from pig latin, be able to load a file based on a supplied regular expression
[ https://issues.apache.org/jira/browse/PIG-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-474: -- Assignee: Earl Cahill > from pig latin, be able to load a file based on a supplied regular expression > - > > Key: PIG-474 > URL: https://issues.apache.org/jira/browse/PIG-474 > Project: Pig > Issue Type: New Feature >Reporter: Earl Cahill >Assignee: Earl Cahill > Attachments: MyRegExLoader-PIG-474 > > > Want to be able to do something like > A = LOAD 'file:test.txt' USING > org.apache.pig.piggybank.storage.MyRegExLoader('(\\d+)!+(\\w+)~+(\\w+)'); > > which would parse lines like > > 1!!!one~i > 2!!two~~ii > 3!three~~~iii > > into arrays like > > {1, "one", "i"}, {2, "two", "ii"}, {3, "three", "iii"} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-472) load files based on user provided regular expressions
[ https://issues.apache.org/jira/browse/PIG-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-472: -- Assignee: Earl Cahill > load files based on user provided regular expressions > - > > Key: PIG-472 > URL: https://issues.apache.org/jira/browse/PIG-472 > Project: Pig > Issue Type: New Feature > Components: data, grunt >Affects Versions: 0.1.0 >Reporter: Earl Cahill >Assignee: Earl Cahill > Fix For: 0.1.0 > > Attachments: RegExLoader-PIG-472 > > > Want to be able to load files based on regular expressions. Each group > specified in parenthesis should end up as a DataAtom, and the list of > DataAtoms should end up in a Tuple. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-473) be able to load files in Apache's common log format
[ https://issues.apache.org/jira/browse/PIG-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-473: -- Assignee: Earl Cahill > be able to load files in Apache's common log format > --- > > Key: PIG-473 > URL: https://issues.apache.org/jira/browse/PIG-473 > Project: Pig > Issue Type: New Feature > Components: data, grunt >Reporter: Earl Cahill >Assignee: Earl Cahill > Attachments: CommonLogLoader-PIG-473 > > > Want to be able to load files that are in Apache's common log format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-487) extract a host from a url
[ https://issues.apache.org/jira/browse/PIG-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-487: -- Assignee: Earl Cahill > extract a host from a url > - > > Key: PIG-487 > URL: https://issues.apache.org/jira/browse/PIG-487 > Project: Pig > Issue Type: New Feature >Reporter: Earl Cahill >Assignee: Earl Cahill > Attachments: HostExtractor-PIG-487 > > > Want to be able to extract the host from a url. For example, > http://sports.espn.go.com/mlb/recap?gameId=281009122 > leads to > sports.espn.go.com > Pig latin usage looks like > host = FOREACH row GENERATE > org.apache.pig.piggybank.evaluation.util.apachelogparser.HostExtractor(url); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.