[jira] Updated: (PIG-1390) Provide a target to generate eclipse-related classpath and files
[ https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated PIG-1390: --- Status: Patch Available (was: Open) Provide a target to generate eclipse-related classpath and files Key: PIG-1390 URL: https://issues.apache.org/jira/browse/PIG-1390 Project: Pig Issue Type: Improvement Components: build Affects Versions: 0.7.0, 0.8.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Fix For: 0.8.0 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, PIG-eclipse_support.patch Currently, after checking out from svn repository, there is no provision to auto-generate eclipse-related classpath and files , which could help in import into eclipse directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1390) Provide a target to generate eclipse-related classpath and files
[ https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated PIG-1390: --- Status: Open (was: Patch Available) Provide a target to generate eclipse-related classpath and files Key: PIG-1390 URL: https://issues.apache.org/jira/browse/PIG-1390 Project: Pig Issue Type: Improvement Components: build Affects Versions: 0.7.0, 0.8.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Fix For: 0.8.0 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, PIG-eclipse_support.patch Currently, after checking out from svn repository, there is no provision to auto-generate eclipse-related classpath and files , which could help in import into eclipse directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1390) Provide a target to generate eclipse-related classpath and files
[ https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V.V.Chaitanya Krishna updated PIG-1390: --- Attachment: PIG-1390-3.patch Uploading new patch with comments of Thejas incorporated. I tried the new patch on my linux box with eclipse-3.5 and it worked well, except for the src-gen error that is mentioned in the previous comment. Provide a target to generate eclipse-related classpath and files Key: PIG-1390 URL: https://issues.apache.org/jira/browse/PIG-1390 Project: Pig Issue Type: Improvement Components: build Affects Versions: 0.7.0, 0.8.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Fix For: 0.8.0 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, PIG-eclipse_support.patch Currently, after checking out from svn repository, there is no provision to auto-generate eclipse-related classpath and files , which could help in import into eclipse directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861344#action_12861344 ] V.V.Chaitanya Krishna commented on PIG-1381: bq. Do we need to have two different property files ? One possibility is to not package pig.properties in the pig.jar and then include it in the classpath while invoking Pig. (We can modify pig shell script to include it in the path by default). Then, user can add/delete/modify the pig.properties as he wish as well override default properties. This might lead to overriding of some properties which might actually be unacceptable. Also, in the long run, we might want to have a configuration file with properties that are not supposed to be changed (similar to what happened in case of Hadoop project) bq. Disadvantage of two property files, is sometimes its confusing which property is getting picked up (one in default or one in user specified). If there is only one property file, there is only one way to specify the properties to Pig which I think is better way of doing it. Since the processing of properties' files is sequential (i.e., one file after another), we can be sure that the latest occuring value is taken for a give property. For example, we can load the default properties' file first and then followed by the one in which users give their own set of properties. This way, we could provide preference to users' settings as well. Thoughts? Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Fix For: 0.8.0 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1390) Provide a target to generate eclipse-related classpath and files
[ https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861423#action_12861423 ] Hadoop QA commented on PIG-1390: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442946/PIG-1390-3.patch against trunk revision 937570. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/304/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/304/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/304/console This message is automatically generated. Provide a target to generate eclipse-related classpath and files Key: PIG-1390 URL: https://issues.apache.org/jira/browse/PIG-1390 Project: Pig Issue Type: Improvement Components: build Affects Versions: 0.7.0, 0.8.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Fix For: 0.8.0 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, PIG-eclipse_support.patch Currently, after checking out from svn repository, there is no provision to auto-generate eclipse-related classpath and files , which could help in import into eclipse directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1386: - Status: Open (was: Patch Available) UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test; import java.io.IOException; import java.net.URISyntaxException; import java.net.URL;
[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1386: - Status: Patch Available (was: Open) Fix Version/s: 0.8.0 UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test; import java.io.IOException; import
[jira] Commented: (PIG-1303) unable to set outgoing format for org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor
[ https://issues.apache.org/jira/browse/PIG-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861442#action_12861442 ] Alan Gates commented on PIG-1303: - Dmitry, I'll try to get to reviewing this patch today. unable to set outgoing format for org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor Key: PIG-1303 URL: https://issues.apache.org/jira/browse/PIG-1303 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Environment: pig 0.6.0 on a fedora linux machine, jdk 1.6 u11 Reporter: Johannes Rußek Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0, 0.8.0 Attachments: PIG-1303.patch, TypeCheckingVisitor.java.diff I'm unable to set the format of the outgoing date string in the constructor as it's supposed to work. The only way i could change the format was to change the default in the java class and rebuild piggybank. Apparently this has something to do with the way pig instantiates DateExtractor, quoting a replier on the mailing list: David Vrensk said: I ran into the same problem a couple of weeks ago, and played around with the code inserting some print/log statements. It turns out that the arguments are only used in the initial constructor calls, when the pig process is starting, but once pig reaches the point where it would use the udf, it creates new DateExtractors without passing the arguments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1386: - Status: Open (was: Patch Available) UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test; import java.io.IOException; import java.net.URISyntaxException;
[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1386: - Attachment: (was: PIG-1386-trunk.patch) UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test; import java.io.IOException; import
[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1386: - Attachment: PIG-1386-trunk.patch e503949c4f5f2667657ee02872aff5ce Additional documentation and examples. UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test;
[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1386: - Status: Patch Available (was: Open) UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test; import java.io.IOException; import java.net.URISyntaxException;
[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1386: - Attachment: PIG-1386-trunk.patch da673ab2d584faf903e8b49b63a03ade spell check the documentation UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test; import
[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hc busy updated PIG-1386: - Attachment: (was: PIG-1386-trunk.patch) UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test; import java.io.IOException; import
[jira] Updated: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted
[ https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1391: --- Attachment: PIG-1391.06.patch Patch for 0.6 branch. It reduces the number of temp files being left behind from around 1767 to 135 . It changes only contents of test/ dir. As the patch does not apply to trunk, I have manually run the unit tests and test-patch . All unit tests succeeded, pasting result of test-patch - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 208 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted --- Key: PIG-1391 URL: https://issues.apache.org/jira/browse/PIG-1391 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.7.0 Attachments: minicluster.patch, PIG-1391.06.patch Pig unit test runs leave behind files in temp dir (/tmp) and there are too many files in the directory over time. Most of the files are left behind by MiniCluster . It closes/shutsdown MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the constructor is called, only in finalize(). And java does not guarantee that finalize() will be called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861544#action_12861544 ] Hadoop QA commented on PIG-1386: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442926/PIG-1386-trunk.patch against trunk revision 937570. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 537 release audit warnings (more than the trunk's current 535 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/305/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/305/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/305/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/305/console This message is automatically generated. UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Attachment: PIG-1378.patch Attached patch addresses the issue in the description by changing LoadFunc.relativeToAbsolutePath() implementation to only convert input locations if the location does not have a scheme or the path in the location is not absolute. har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at
[jira] Updated: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Status: Patch Available (was: Open) Assignee: Pradeep Kamath har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245) {noformat} Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to
[jira] Commented: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField
[ https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861588#action_12861588 ] Hadoop QA commented on PIG-1386: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442973/PIG-1386-trunk.patch against trunk revision 937570. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/303/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/303/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/303/console This message is automatically generated. UDF to extend functionalities of MaxTupleBy1stField --- Key: PIG-1386 URL: https://issues.apache.org/jira/browse/PIG-1386 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1386-trunk.patch Based on this conversation: totally, go for it, it'd be pretty straightforward to add this functionality. - Hide quoted text - On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote: Hey, while we're on the subject, and I have your attention, can we re-factor the UDF MaxTupleByFirstField to take constructor? *define customMaxTuple ExtremalTupleByNthField(n, 'min');* *G = group T by id;* *M = foreach T generate customMaxTuple(T); * Where n is the nth field, and the second parameter allows us to specify min, max, median, etc... Does this seem like something useful to everyone? On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import
[jira] Commented: (PIG-1395) Mapside cogroup runs out of memory
[ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861611#action_12861611 ] Pradeep Kamath commented on PIG-1395: - +1, the comment can be updated to reflect the nature of the comparison in the code - currently the comment and code seem to be different. - otherwise the change looks good. Mapside cogroup runs out of memory -- Key: PIG-1395 URL: https://issues.apache.org/jira/browse/PIG-1395 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: cogrp_mem.patch In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1395) Mapside cogroup runs out of memory
[ https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1395: -- Status: Resolved (was: Patch Available) Resolution: Fixed Patch checked-in with updated comment. Mapside cogroup runs out of memory -- Key: PIG-1395 URL: https://issues.apache.org/jira/browse/PIG-1395 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: cogrp_mem.patch In a particular scenario when there aren't lot of tuples with a same key in a relation (i.e. there aren't many repeating keys) map tasks doing cogroup fails with GC overhead exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1390) Provide a target to generate eclipse-related classpath and files
[ https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861638#action_12861638 ] Thejas M Nair commented on PIG-1390: I have updated the instructions in http://wiki.apache.org/pig/Eclipse_Environment Provide a target to generate eclipse-related classpath and files Key: PIG-1390 URL: https://issues.apache.org/jira/browse/PIG-1390 Project: Pig Issue Type: Improvement Components: build Affects Versions: 0.7.0, 0.8.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Fix For: 0.8.0 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, PIG-eclipse_support.patch Currently, after checking out from svn repository, there is no provision to auto-generate eclipse-related classpath and files , which could help in import into eclipse directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1303) unable to set outgoing format for org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor
[ https://issues.apache.org/jira/browse/PIG-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861639#action_12861639 ] Alan Gates commented on PIG-1303: - Sorry, I didn't make it to reviewing this today. I'll put it at the top of tomorrow's list. On the 0.7 question, I'm open to that as long as we test it really well. unable to set outgoing format for org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor Key: PIG-1303 URL: https://issues.apache.org/jira/browse/PIG-1303 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Environment: pig 0.6.0 on a fedora linux machine, jdk 1.6 u11 Reporter: Johannes Rußek Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0, 0.8.0 Attachments: PIG-1303.patch, TypeCheckingVisitor.java.diff I'm unable to set the format of the outgoing date string in the constructor as it's supposed to work. The only way i could change the format was to change the default in the java class and rebuild piggybank. Apparently this has something to do with the way pig instantiates DateExtractor, quoting a replier on the mailing list: David Vrensk said: I ran into the same problem a couple of weeks ago, and played around with the code inserting some print/log statements. It turns out that the arguments are only used in the initial constructor calls, when the pig process is starting, but once pig reaches the point where it would use the udf, it creates new DateExtractors without passing the arguments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1390) Provide a target to generate eclipse-related classpath and files
[ https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861649#action_12861649 ] V.V.Chaitanya Krishna commented on PIG-1390: @Thejas : The short-cut to adding pig to eclipse doesn't need ant clean jar to be run. the src-gen and the required jars are downloaded while running ant eclipse-files itself. These are the steps that could be followed and imported to eclipse in a faster way : 1. checkout the trunk code. 2. run ant eclipse-files. 3. open eclipse and import the existing project. In case checkout is done using subclipse in eclipse, one can refresh it after running ant eclipse-files. Provide a target to generate eclipse-related classpath and files Key: PIG-1390 URL: https://issues.apache.org/jira/browse/PIG-1390 Project: Pig Issue Type: Improvement Components: build Affects Versions: 0.7.0, 0.8.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Fix For: 0.8.0 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, PIG-eclipse_support.patch Currently, after checking out from svn repository, there is no provision to auto-generate eclipse-related classpath and files , which could help in import into eclipse directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1390) Provide a target to generate eclipse-related classpath and files
[ https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861650#action_12861650 ] V.V.Chaitanya Krishna commented on PIG-1390: Thejas, can you please change the wiki accordingly? Thanks. Provide a target to generate eclipse-related classpath and files Key: PIG-1390 URL: https://issues.apache.org/jira/browse/PIG-1390 Project: Pig Issue Type: Improvement Components: build Affects Versions: 0.7.0, 0.8.0 Reporter: V.V.Chaitanya Krishna Assignee: V.V.Chaitanya Krishna Fix For: 0.8.0 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, PIG-eclipse_support.patch Currently, after checking out from svn repository, there is no provision to auto-generate eclipse-related classpath and files , which could help in import into eclipse directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861659#action_12861659 ] Hadoop QA commented on PIG-1378: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443013/PIG-1378.patch against trunk revision 937570. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 42 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/306/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/306/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/306/console This message is automatically generated. har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme:
how to compare?
guys, I'm implementing that ExtremalTupleByNthField and I have a question about comparison... So, when I have parsed out the two objects that I want to compare how do I perform that comparison? My current implementation assumes the data is Comparable (which they invariably are within pig) so I do int c = ((Comparable)o1).compareTo((Comparable)o2); now I also see that there's another compare that compares the two objects by: int c = DataType.compare(o1, o2, DataType.findType(o1), DataType.findType(o2)); The initial methods works for all types I've tried (int, string, etc.) But the latter is used by another UDF already in SVN. What are your suggestions? (PIG-1386 is ticket where I've checked in the patch).