Re: how to compare?

2010-04-28 Thread Gianmarco
Basically, DataType.compare() just calls the compareTo() method of the two
objects after checking that the two types are the same.
However, DataType.compare() does 2 things more than a simple compareTo().

Firts, it is specialized for Maps, for which sizes are taken into account
and keys are sorted.

Second, it imposes an (arbitrary) order on different data types. In this way
the types are not dependent on each other and there is a single point of
control.

So I think you should use DataType.compare() unless you are sure you do not
need these features.

Anyway, there is something that I do not understand.

What I do not understand is why the function needs to switch on the datatype
byte and cast the objects before calling the compareTo on them. Just casting
them to Comparable and letting Java run the proper polymorphic method should
work as well, right?




On Wed, Apr 28, 2010 at 07:12, hc busy hc.b...@gmail.com wrote:

 guys, I'm implementing that ExtremalTupleByNthField and I have a question
 about comparison...


 So, when I have parsed out the two objects that I want to compare how do I
 perform that comparison? My current implementation assumes the data is
 Comparable (which they invariably are within pig) so I do


 int c = ((Comparable)o1).compareTo((Comparable)o2);


 now I also see that there's another compare that compares the two objects
 by:


 int c = DataType.compare(o1, o2, DataType.findType(o1),
 DataType.findType(o2));



 The initial methods works for all types I've tried (int, string, etc.) But
 the latter is used by another UDF already in SVN.

 What are your suggestions?

 (PIG-1386 is ticket where I've checked in the patch).



Re: how to compare?

2010-04-28 Thread hc busy
I'm not sure. If the type of two things that I am comparing (typically same
field of tuples inside a bag) I expect it to throw an error instead of
ordering the results by the datatype.

Because if it doesn't, it will either error out later on in the pigscript or
it will be serialized out and some program and that program will read in
offending field and crash. I'd prefer it fail early than late.

Which is why I'm just casting to Comparable and calling compareTo. The
problem with that is that it depends on each of the Comparable's compareTo
method to handle errors in similar ways. and I see that it does by
calling into DataType.compare(circa l166 in DataByteArray for
BYTEARRAY's...) ahh I see, so by casting to comparable it does the same as
DataType.compare when the types are different.

H, I guess I want to stick to casting to Comparables, since the two ways
of calling them are identical. Unless people have other comments.



On Wed, Apr 28, 2010 at 3:57 AM, Gianmarco gianmarco@gmail.com wrote:

 Basically, DataType.compare() just calls the compareTo() method of the two
 objects after checking that the two types are the same.
 However, DataType.compare() does 2 things more than a simple compareTo().

 Firts, it is specialized for Maps, for which sizes are taken into account
 and keys are sorted.

 Second, it imposes an (arbitrary) order on different data types. In this
 way
 the types are not dependent on each other and there is a single point of
 control.

 So I think you should use DataType.compare() unless you are sure you do not
 need these features.

 Anyway, there is something that I do not understand.

 What I do not understand is why the function needs to switch on the
 datatype
 byte and cast the objects before calling the compareTo on them. Just
 casting
 them to Comparable and letting Java run the proper polymorphic method
 should
 work as well, right?




 On Wed, Apr 28, 2010 at 07:12, hc busy hc.b...@gmail.com wrote:

  guys, I'm implementing that ExtremalTupleByNthField and I have a question
  about comparison...
 
 
  So, when I have parsed out the two objects that I want to compare how do
 I
  perform that comparison? My current implementation assumes the data is
  Comparable (which they invariably are within pig) so I do
 
 
  int c = ((Comparable)o1).compareTo((Comparable)o2);
 
 
  now I also see that there's another compare that compares the two objects
  by:
 
 
  int c = DataType.compare(o1, o2, DataType.findType(o1),
  DataType.findType(o2));
 
 
 
  The initial methods works for all types I've tried (int, string, etc.)
 But
  the latter is used by another UDF already in SVN.
 
  What are your suggestions?
 
  (PIG-1386 is ticket where I've checked in the patch).