Hi Stas, Don't forget that you should always try to keep the number of columns families lower than 3, else you might face some performances issues.
JM 2013/2/19, Stas Maksimov <maksi...@gmail.com>: > Hi Jean-Marc, > > I've validated this, it works perfectly. Very easy to implement and it's > very fast! > > Thankfully in this project there isn't a lot of lists in each table, so I > won't have to create too many column families. In other scenarios it could > be a problem. > > Many thanks, > Stas > > > On 16 February 2013 02:29, Jean-Marc Spaggiari > <jean-m...@spaggiari.org>wrote: > >> Hi Stas, >> >> Few options are coming into my mind. >> >> Quickly: >> 1) Why not storing the products in specif columns instead of in the >> same one? Like: >> table, rowid1, cf:list, c:aa, value:true >> table, rowid1, cf:list, c:bb, value:true >> table, rowid1, cf:list, c:cc, value:true >> table, rowid2, cf:list, c:aabb, value:true >> table, rowid2, cf:list, c:cc, value:true >> That way when you do a search you query directly the right column for >> the right row. And using "exist" call with also reduce the size of the >> data transfered. >> >> 2) You can store the data in the oposite way. Like: >> table, aa, cf:products, c:rowid1, value:true >> table, aabb, cf:products, c:rowid2, value:true >> table, bb, cf:products, c:rowid1, value:true >> table, cc, cf:products, c:rowid1, value:true >> table, cc, cf:products, c:rowid2, value:true >> Here, you query by your product ID, and you search the column based on >> your previous rowid. >> >> >> I will say the 2 solutions are equivalent, but it will really depend >> on your data pattern and you query pattern. >> >> JM >> >> 2013/2/15, Stas Maksimov <maksi...@gmail.com>: >> > Hi all, >> > >> > I have a requirement to store lists in HBase columns like this: >> > "table", "rowid1", "f:list", "aa, bb, cc" >> > "table", "rowid2", "f:list", "aabb, cc" >> > >> > There is a further requirement to be able to find rows where f:list >> > contains a particular item, e.g. when I need to find rows having item >> "aa" >> > only "rowid1" should match, and for item "cc" both "rowid1" and >> > "rowid2" >> > should match. >> > >> > For now I decided to use SingleColumnValueFilter with substring >> > matching. >> > As using comma-separated list proved difficult to search through, I'm >> using >> > pipe symbols to separate items like this: "|aa|bb|cc|", so that I could >> > pass the search item surrounded by pipes into the filter: >> > SingleColumnValueFilter ('f', 'list', =, 'substring:|aa|') >> > >> > This proved to work effectively enough, however I would prefer to use >> > something more standard for my list storage (e.g. serialised JSON), or >> > perhaps something even more optimised for a search - performance really >> > does matter here. >> > >> > Any opinions on this solution and possible enhancements are much >> > appreciated. >> > >> > Many thanks, >> > Stas >> > >> >