Hello John, Are you running HBase and Pig with IBM Java?
We found an error in one Pig unit test when building with IBM Java and looks like the problem is the same you are reporting. Please, check the JIRA [1] that's explaining the problem in Pig and the solution there. [1] https://issues.apache.org/jira/browse/PIG-3309 If the error is the same and you are using IBM Java, the problem is how HashMap implementation of IBM order the map - it's different from Oracle's (Sun) implementation. Best regards, Paulo Vital On Fri, 2013-09-13 at 16:38 +0200, John wrote: > Hi, I already ask this on the pig mailing list. But because I'm not sure if > it is a Pig or HBase issue, I will ask here too since the Pig Function is > using a hbae scan operation. Here is my Questions: > > I have created a HBase Table in the hbase shell and added some data. In > http://hbase.apache.org/book/dm.sort.html is written that the datasets are > first sorted by the rowkey and then the column. So I tried something in the > HBase Shell: http://pastebin.com/gLVAX0rJ > > Everything looks fine. I got the right order a -> c -> d like expected. > > Now I tried the same with Apache Pig in Java: http://pastebin.com/jdTpj4Fu > > I got this result: > > (key1,[c#val,d#val,a#val]) > > So, now the order is c -> d -> a. That seems a little odd to me, shouldn't > it be the same like in HBase? It's important for me to get the right order > because I transform the map afterwards into a bag and then join it with > other tables. If both inputs are sorted I could use a merge join without > sorting these two datasets. So does anyone know how it is possible to get > the sorted map (or bag) of the columns? > > > thanks -- Paulo Ricardo Paz Vital <[email protected]> IBM Linux Technology Center
