John, yeah the first option looks better. Glad you solve the problem Best regards, Paulo Vital
On Fri, 2013-09-13 at 19:11 +0200, John wrote: > Hi, thanks for your answer. I solved the problem. Here is the answer from > another mailing list: > > The problem is that HBaseStorage maps > columns families into a HashMap, so the sort ordering is completely lost. > > You have two options: > > 1. Modify HBaseStorage to use a SortedMap data structure (i.e. TreeMap) and > use the modified HBaseStorage. (or make it configurable) > 2. Since you convert the map to a bag, you can sort the bag in a nested > foreach statement. > > I prefer option 1 myself because it would be more performant than option 2. > > Thanks anyway! > > > 2013/9/13 Paulo Ricardo Paz Vital <[email protected]> > > > Hello John, > > > > Are you running HBase and Pig with IBM Java? > > > > We found an error in one Pig unit test when building with IBM Java and > > looks like the problem is the same you are reporting. Please, check the > > JIRA [1] that's explaining the problem in Pig and the solution there. > > > > [1] https://issues.apache.org/jira/browse/PIG-3309 > > > > If the error is the same and you are using IBM Java, the problem is how > > HashMap implementation of IBM order the map - it's different from > > Oracle's (Sun) implementation. > > > > Best regards, > > Paulo Vital > > > > On Fri, 2013-09-13 at 16:38 +0200, John wrote: > > > Hi, I already ask this on the pig mailing list. But because I'm not sure > > if > > > it is a Pig or HBase issue, I will ask here too since the Pig Function is > > > using a hbae scan operation. Here is my Questions: > > > > > > I have created a HBase Table in the hbase shell and added some data. In > > > http://hbase.apache.org/book/dm.sort.html is written that the datasets > > are > > > first sorted by the rowkey and then the column. So I tried something in > > the > > > HBase Shell: http://pastebin.com/gLVAX0rJ > > > > > > Everything looks fine. I got the right order a -> c -> d like expected. > > > > > > Now I tried the same with Apache Pig in Java: > > http://pastebin.com/jdTpj4Fu > > > > > > I got this result: > > > > > > (key1,[c#val,d#val,a#val]) > > > > > > So, now the order is c -> d -> a. That seems a little odd to me, > > shouldn't > > > it be the same like in HBase? It's important for me to get the right > > order > > > because I transform the map afterwards into a bag and then join it with > > > other tables. If both inputs are sorted I could use a merge join without > > > sorting these two datasets. So does anyone know how it is possible to get > > > the sorted map (or bag) of the columns? > > > > > > > > > thanks > > > > -- > > Paulo Ricardo Paz Vital <[email protected]> > > IBM Linux Technology Center > > > >
