[
https://issues.apache.org/jira/browse/PIG-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883557#action_12883557
]
Gianmarco De Francisci Morales commented on PIG-1468:
-----------------------------------------------------
I ran some tests. I see a ~1% decrease in performance overall.
I looked around the codebase for references to the method, and it does not seem
there is any place that relies on the specific ordering.
Here is the code I used:
{code}
import java.util.Random;
public class TestSpeed {
private static final int TIMES = (int) 10e6;
private static final int NUM_ARRAYS = (int) 10e5;
private static final int ARRAY_LENGTH = 50;
private static int compareSigned(byte[] b1, byte[] b2) {
int i;
for (i = 0; i < b1.length; i++) {
if (i >= b2.length)
return 1;
int a = b1[i];
int b = b2[i];
if (a < b)
return -1;
else if (a > b)
return 1;
}
if (i < b2.length)
return -1;
return 0;
}
private static int compareUnsisgned(byte[] b1, byte[] b2) {
int i;
for (i = 0; i < b1.length; i++) {
if (i >= b2.length)
return 1;
int a = b1[i] & 0xff;
int b = b2[i] & 0xff;
if (a < b)
return -1;
else if (a > b)
return 1;
}
if (i < b2.length)
return -1;
return 0;
}
public static void main(String[] args) {
long before, after;
Random rand = new Random(123456789);
byte[][] batch1 = new byte[NUM_ARRAYS][];
byte[][] batch2 = new byte[NUM_ARRAYS][];
for (int i = 0; i < NUM_ARRAYS; i++) {
batch1[i] = new byte[ARRAY_LENGTH];
batch2[i] = new byte[ARRAY_LENGTH];
rand.nextBytes(batch1[i]);
rand.nextBytes(batch2[i]);
}
before = System.currentTimeMillis();
for (int i = 0; i < TIMES; i++)
for (int j = 0; j < ARRAY_LENGTH; j++)
compareSigned(batch1[j], batch2[j]);
after = System.currentTimeMillis();
System.out.println("Time for signed comparison (ms): " + (after -
before));
before = System.currentTimeMillis();
for (int i = 0; i < TIMES; i++)
for (int j = 0; j < ARRAY_LENGTH; j++)
compareUnsisgned(batch1[j], batch2[j]);
after = System.currentTimeMillis();
System.out.println("Time for UNsigned comparison (ms): " + (after -
before));
}
}
{code}
> DataByteArray.compareTo() does not compare in lexicographic order
> -----------------------------------------------------------------
>
> Key: PIG-1468
> URL: https://issues.apache.org/jira/browse/PIG-1468
> Project: Pig
> Issue Type: Bug
> Reporter: Gianmarco De Francisci Morales
> Assignee: Gianmarco De Francisci Morales
> Attachments: PIG-1468.patch
>
>
> The compareTo() method of org.apache.pig.data.DataByteArray does not compare
> items in lexicographic order.
> Actually, it takes into account the signum of the bytes that compose the
> DataByteArray.
> So, for example, 0xff compares to less than 0x00
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.