[jira] Updated: (PIG-1390) Provide a target to generate eclipse-related classpath and files

2010-04-27 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated PIG-1390:
---

Status: Patch Available  (was: Open)

 Provide a target to generate eclipse-related classpath and files
 

 Key: PIG-1390
 URL: https://issues.apache.org/jira/browse/PIG-1390
 Project: Pig
  Issue Type: Improvement
  Components: build
Affects Versions: 0.7.0, 0.8.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.8.0

 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, 
 PIG-eclipse_support.patch


 Currently, after checking out from svn repository, there is no provision to 
 auto-generate eclipse-related classpath and files , which could help in 
 import into eclipse directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1390) Provide a target to generate eclipse-related classpath and files

2010-04-27 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated PIG-1390:
---

Status: Open  (was: Patch Available)

 Provide a target to generate eclipse-related classpath and files
 

 Key: PIG-1390
 URL: https://issues.apache.org/jira/browse/PIG-1390
 Project: Pig
  Issue Type: Improvement
  Components: build
Affects Versions: 0.7.0, 0.8.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.8.0

 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, 
 PIG-eclipse_support.patch


 Currently, after checking out from svn repository, there is no provision to 
 auto-generate eclipse-related classpath and files , which could help in 
 import into eclipse directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1390) Provide a target to generate eclipse-related classpath and files

2010-04-27 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated PIG-1390:
---

Attachment: PIG-1390-3.patch

Uploading new patch with comments of Thejas incorporated.

I tried the new patch on my linux box with eclipse-3.5 and it worked well, 
except for the src-gen error that is mentioned in the previous comment.

 Provide a target to generate eclipse-related classpath and files
 

 Key: PIG-1390
 URL: https://issues.apache.org/jira/browse/PIG-1390
 Project: Pig
  Issue Type: Improvement
  Components: build
Affects Versions: 0.7.0, 0.8.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.8.0

 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, 
 PIG-eclipse_support.patch


 Currently, after checking out from svn repository, there is no provision to 
 auto-generate eclipse-related classpath and files , which could help in 
 import into eclipse directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file

2010-04-27 Thread V.V.Chaitanya Krishna (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861344#action_12861344
 ] 

V.V.Chaitanya Krishna commented on PIG-1381:


bq. Do we need to have two different property files ? One possibility is to not 
package pig.properties in the pig.jar and then include it in the classpath 
while invoking Pig. (We can modify pig shell script to include it in the path 
by default). Then, user can add/delete/modify the pig.properties as he wish as 
well override default properties.

  This might lead to overriding of some properties which might actually be 
unacceptable. Also, in the long run, we might want to have a configuration file 
with properties that are not supposed to be changed (similar to what happened 
in case of Hadoop project)

bq. Disadvantage of two property files, is sometimes its confusing which 
property is getting picked up (one in default or one in user specified). If 
there is only one property file, there is only one way to specify the 
properties to Pig which I think is better way of doing it.

  Since the processing of properties' files is sequential (i.e., one file after 
another), we can be sure that the latest occuring value is taken for a give 
property. For example, we can load the default properties' file first and then 
followed by the one in which users give their own set of properties. This way, 
we could provide preference to users' settings as well.

Thoughts?

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
 Fix For: 0.8.0


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1390) Provide a target to generate eclipse-related classpath and files

2010-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861423#action_12861423
 ] 

Hadoop QA commented on PIG-1390:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12442946/PIG-1390-3.patch
  against trunk revision 937570.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/304/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/304/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/304/console

This message is automatically generated.

 Provide a target to generate eclipse-related classpath and files
 

 Key: PIG-1390
 URL: https://issues.apache.org/jira/browse/PIG-1390
 Project: Pig
  Issue Type: Improvement
  Components: build
Affects Versions: 0.7.0, 0.8.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.8.0

 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, 
 PIG-eclipse_support.patch


 Currently, after checking out from svn repository, there is no provision to 
 auto-generate eclipse-related classpath and files , which could help in 
 import into eclipse directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Status: Open  (was: Patch Available)

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   import java.io.IOException;
   import java.net.URISyntaxException;
   import java.net.URL;
  
   

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

   Status: Patch Available  (was: Open)
Fix Version/s: 0.8.0

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   import java.io.IOException;
   import 

[jira] Commented: (PIG-1303) unable to set outgoing format for org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor

2010-04-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861442#action_12861442
 ] 

Alan Gates commented on PIG-1303:
-

Dmitry, I'll try to get to reviewing this patch today.

 unable to set outgoing format for 
 org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor
 

 Key: PIG-1303
 URL: https://issues.apache.org/jira/browse/PIG-1303
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: pig 0.6.0 on a fedora linux machine, jdk 1.6 u11
Reporter: Johannes Rußek
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1303.patch, TypeCheckingVisitor.java.diff


 I'm unable to set the format of the outgoing date string in the constructor 
 as it's supposed to work. 
 The only way i could change the format was to change the default in the java 
 class and rebuild piggybank.
 Apparently this has something to do with the way pig instantiates 
 DateExtractor, quoting a replier on the mailing list:
 David Vrensk said:
 I ran into the same problem a couple of weeks ago, and
 played around with the code inserting some print/log statements.  It turns
 out that the arguments are only used in the initial constructor calls, when
 the pig process is starting, but once pig reaches the point where it would
 use the udf, it creates new DateExtractors without passing the arguments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Status: Open  (was: Patch Available)

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   import java.io.IOException;
   import java.net.URISyntaxException;
   

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Attachment: (was: PIG-1386-trunk.patch)

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   import java.io.IOException;
   import 

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Attachment: PIG-1386-trunk.patch

e503949c4f5f2667657ee02872aff5ce

Additional documentation and examples.

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Status: Patch Available  (was: Open)

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   import java.io.IOException;
   import java.net.URISyntaxException;
   

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Attachment: PIG-1386-trunk.patch

da673ab2d584faf903e8b49b63a03ade
 
spell check the documentation

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   import 

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1386:
-

Attachment: (was: PIG-1386-trunk.patch)

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   import java.io.IOException;
   import 

[jira] Updated: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted

2010-04-27 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1391:
---

Attachment: PIG-1391.06.patch

Patch for 0.6 branch. It reduces the number of temp files being left behind 
from around 1767 to 135 .  It changes only contents of test/ dir. 

As the patch does not apply to trunk, I have manually run the unit tests and 
test-patch . All unit tests succeeded, pasting result of test-patch - 
 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 208 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 pig unit tests leave behind files in temp directory because MiniCluster files 
 don't get deleted
 ---

 Key: PIG-1391
 URL: https://issues.apache.org/jira/browse/PIG-1391
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.7.0

 Attachments: minicluster.patch, PIG-1391.06.patch


 Pig unit test runs leave behind files in temp dir (/tmp) and there are too 
 many files in the directory over time.
 Most of the files are left behind by MiniCluster . It closes/shutsdown 
 MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the 
 constructor is called, only in finalize(). And java does not guarantee that 
 finalize() will be called. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861544#action_12861544
 ] 

Hadoop QA commented on PIG-1386:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12442926/PIG-1386-trunk.patch
  against trunk revision 937570.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 537 release audit warnings 
(more than the trunk's current 535 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/305/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/305/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/305/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/305/console

This message is automatically generated.

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import 

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-27 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Attachment: PIG-1378.patch

Attached patch addresses the issue in the description by changing 
LoadFunc.relativeToAbsolutePath() implementation to only convert input 
locations if the location does not have a scheme or the path in the location is 
not absolute.

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-27 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


  Status: Patch Available  (was: Open)
Assignee: Pradeep Kamath

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
 {noformat}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to 

[jira] Commented: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField

2010-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861588#action_12861588
 ] 

Hadoop QA commented on PIG-1386:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12442973/PIG-1386-trunk.patch
  against trunk revision 937570.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/303/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/303/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/303/console

This message is automatically generated.

 UDF to extend functionalities of MaxTupleBy1stField
 ---

 Key: PIG-1386
 URL: https://issues.apache.org/jira/browse/PIG-1386
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1386-trunk.patch


 Based on this conversation:
 totally, go for it, it'd be pretty straightforward to add this
 functionality.
 - Hide quoted text -
 On Tue, Apr 20, 2010 at 6:45 PM, hc busy hc.b...@gmail.com wrote:
  Hey, while we're on the subject, and I have your attention, can we
  re-factor
  the UDF MaxTupleByFirstField to take constructor?
 
  *define customMaxTuple ExtremalTupleByNthField(n, 'min');*
  *G = group T by id;*
  *M = foreach T generate customMaxTuple(T);
  *
 
  Where n is the nth field, and the second parameter allows us to specify
  min, max, median,  etc...
 
  Does this seem like something useful to everyone?
 
 
 
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import 

[jira] Commented: (PIG-1395) Mapside cogroup runs out of memory

2010-04-27 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861611#action_12861611
 ] 

Pradeep Kamath commented on PIG-1395:
-

+1, the comment can be updated to reflect the nature of the comparison in the 
code - currently the comment and code seem to be different. - otherwise the 
change looks good.

 Mapside cogroup runs out of memory
 --

 Key: PIG-1395
 URL: https://issues.apache.org/jira/browse/PIG-1395
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: cogrp_mem.patch


 In a particular scenario when there aren't lot of tuples with a same key in a 
 relation (i.e. there aren't many repeating keys) map tasks doing cogroup 
 fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1395) Mapside cogroup runs out of memory

2010-04-27 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1395:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch checked-in with updated comment.

 Mapside cogroup runs out of memory
 --

 Key: PIG-1395
 URL: https://issues.apache.org/jira/browse/PIG-1395
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: cogrp_mem.patch


 In a particular scenario when there aren't lot of tuples with a same key in a 
 relation (i.e. there aren't many repeating keys) map tasks doing cogroup 
 fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1390) Provide a target to generate eclipse-related classpath and files

2010-04-27 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861638#action_12861638
 ] 

Thejas M Nair commented on PIG-1390:


I have updated the instructions in 
http://wiki.apache.org/pig/Eclipse_Environment

 Provide a target to generate eclipse-related classpath and files
 

 Key: PIG-1390
 URL: https://issues.apache.org/jira/browse/PIG-1390
 Project: Pig
  Issue Type: Improvement
  Components: build
Affects Versions: 0.7.0, 0.8.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.8.0

 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, 
 PIG-eclipse_support.patch


 Currently, after checking out from svn repository, there is no provision to 
 auto-generate eclipse-related classpath and files , which could help in 
 import into eclipse directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1303) unable to set outgoing format for org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor

2010-04-27 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861639#action_12861639
 ] 

Alan Gates commented on PIG-1303:
-

Sorry, I didn't make it to reviewing this today.  I'll put it at the top of 
tomorrow's list.  

On the 0.7 question, I'm open to that as long as we test it really well.

 unable to set outgoing format for 
 org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor
 

 Key: PIG-1303
 URL: https://issues.apache.org/jira/browse/PIG-1303
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: pig 0.6.0 on a fedora linux machine, jdk 1.6 u11
Reporter: Johannes Rußek
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1303.patch, TypeCheckingVisitor.java.diff


 I'm unable to set the format of the outgoing date string in the constructor 
 as it's supposed to work. 
 The only way i could change the format was to change the default in the java 
 class and rebuild piggybank.
 Apparently this has something to do with the way pig instantiates 
 DateExtractor, quoting a replier on the mailing list:
 David Vrensk said:
 I ran into the same problem a couple of weeks ago, and
 played around with the code inserting some print/log statements.  It turns
 out that the arguments are only used in the initial constructor calls, when
 the pig process is starting, but once pig reaches the point where it would
 use the udf, it creates new DateExtractors without passing the arguments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1390) Provide a target to generate eclipse-related classpath and files

2010-04-27 Thread V.V.Chaitanya Krishna (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861649#action_12861649
 ] 

V.V.Chaitanya Krishna commented on PIG-1390:


@Thejas : The short-cut to adding pig to eclipse doesn't need ant clean jar 
to be run. the src-gen and the required jars are downloaded while running ant 
eclipse-files itself.

These are the steps that could be followed and imported to eclipse in a faster 
way : 
1. checkout the trunk code.
2. run ant eclipse-files.
3. open eclipse and import the existing project.

In case checkout is done using subclipse in eclipse, one can refresh it after 
running ant eclipse-files.

 Provide a target to generate eclipse-related classpath and files
 

 Key: PIG-1390
 URL: https://issues.apache.org/jira/browse/PIG-1390
 Project: Pig
  Issue Type: Improvement
  Components: build
Affects Versions: 0.7.0, 0.8.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.8.0

 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, 
 PIG-eclipse_support.patch


 Currently, after checking out from svn repository, there is no provision to 
 auto-generate eclipse-related classpath and files , which could help in 
 import into eclipse directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1390) Provide a target to generate eclipse-related classpath and files

2010-04-27 Thread V.V.Chaitanya Krishna (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861650#action_12861650
 ] 

V.V.Chaitanya Krishna commented on PIG-1390:


Thejas, can you please change the wiki accordingly?
Thanks.

 Provide a target to generate eclipse-related classpath and files
 

 Key: PIG-1390
 URL: https://issues.apache.org/jira/browse/PIG-1390
 Project: Pig
  Issue Type: Improvement
  Components: build
Affects Versions: 0.7.0, 0.8.0
Reporter: V.V.Chaitanya Krishna
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.8.0

 Attachments: PIG-1390-2.patch, PIG-1390-3.patch, 
 PIG-eclipse_support.patch


 Currently, after checking out from svn repository, there is no provision to 
 auto-generate eclipse-related classpath and files , which could help in 
 import into eclipse directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1378) har url not usable in Pig scripts

2010-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861659#action_12861659
 ] 

Hadoop QA commented on PIG-1378:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443013/PIG-1378.patch
  against trunk revision 937570.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 42 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/306/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/306/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/306/console

This message is automatically generated.

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: 

how to compare?

2010-04-27 Thread hc busy
guys, I'm implementing that ExtremalTupleByNthField and I have a question
about comparison...


So, when I have parsed out the two objects that I want to compare how do I
perform that comparison? My current implementation assumes the data is
Comparable (which they invariably are within pig) so I do


int c = ((Comparable)o1).compareTo((Comparable)o2);


now I also see that there's another compare that compares the two objects
by:


int c = DataType.compare(o1, o2, DataType.findType(o1),
DataType.findType(o2));



The initial methods works for all types I've tried (int, string, etc.) But
the latter is used by another UDF already in SVN.

What are your suggestions?

(PIG-1386 is ticket where I've checked in the patch).