I understand that there shouldn't be unlimited number of column families. I am using this example on purpose to see how it comes into play.
On Fri, Jul 5, 2013 at 12:07 PM, Michael Segel <[email protected]>wrote: > Why do you have so many column families (CF) ? > > Its not a question on the physical limitations, but more on the issue of > data design. > > There aren't that many really good examples of where you would have > multiple column families that would require more than a handful of CFs. > > When I teach or lecture, the example I use is an order entry system. > Where you would have the same key on Order entry, pick slips, shipping, > and invoice. > > That's probably the best example of where CFs come in to play. > > I'd suggest that you go back and rethink the design if you're having more > than a handful. > > > > On Jul 5, 2013, at 8:53 AM, Aji Janis <[email protected]> wrote: > > > Asaf, > > > > I am using the Genre/Author stuff as an example but yes at the moment I > > only have 5 column families. However, over time I may have more (no upper > > limit decided that this point). See below for more responses > > > > > > On Wed, Jul 3, 2013 at 3:42 PM, Asaf Mesika <[email protected]> > wrote: > > > >> Do you have only 5 static author names? > >> Keep in mind the column family name is defined when creating the table. > >> > >> Regarding tall vs wide debate: > >> HBase is first and for most a Key Value database thus reads and writes > in > >> the column-value level. So it doesn't really care about rows. > >> But it's not entirely true. Rows come into play in the following > >> situations: > >> Splitting a region is per row and not per column, thus a row will be > saved > >> as a whole on a region. If you have a really large row, the region size > >> granularity is dependent on it. It doesn't seem to be the case here. > >> Put/Delete creates a lock until finished. If you are intensive on > inserts > >> to the same row at the same time, thus might be bad for you, keeping > your > >> rows slimmer can reduce contention, but again, only if you make a lot > >> concurrent modifications to the same row. > >> > > > > I expect batches of Put/Delete to the same row to happen by at most one > > thread at a time based on user's current behavior. So locking shouldn't > be > > an issue. However, not sure if the saving row to a region with enough > space > > topic is really an issue I need to worry about (probably because I just > > don't know much about it yet). > > > > > >> Filtering - if you need a filter which need all the row (there is a > method > >> you override in Filter to mark that) than a far row will be more memory > >> intensive. If you needed only 1/5 of your row, than maybe splitting it > to 5 > >> rows to begin with would have made a better schema design in terms of > >> memory and I/O. > >> > > > > Currently, my access pattern is to get all data for a given row. Its > > possible in the future we may want to apply (family/qualifier) filters. > > There is a lot of uncertainty on use cases (client side) at this point > > which is why I am not entirely sure on how things will look months from > > now. I am not sure I follow this statement > > > > "if you need a filter which need all the row (there is a method you > > override in Filter to mark that) than a far row will be more memory > > intensive." > > > > Can you please explain? Thank you for these suggestions btw, good food > for > > thought! > > > > > >> > >> On Wednesday, July 3, 2013, Aji Janis wrote: > >> > >>> I have a major typo in the question so I apologize. I meant to say 5 > >>> families with 1000+ qualifiers each. > >>> > >>> Lets work with an example, (not the greatest example here but still). > >> Lets > >>> say we have a Genre Class like this: > >>> > >>> Class HistoryBooks{ > >>> > >>> ArrayList<Books> author1; > >>> ArrayList<Books> author2; > >>> ArrayList<Books> author3; > >>> ArrayList<Books> author4; > >>> ArrayList<Books> author5; > >>> > >>> ...} > >>> > >>> Each author is a column family (lets say we only allow 5 authors per > >>> <T>Book class. Book per author ends up being the qualifier. In this > >> case, I > >>> know I have a max family count but my qualifiers have no upper limit. > So > >> is > >>> this scenario a case for tall or wide table? Why? Thank you. > >>> > >>> > >>> On Tue, Jul 2, 2013 at 9:56 AM, Bryan Beaudreault > >>> <[email protected] <javascript:;>>wrote: > >>> > >>>> If they are accessed mostly together they should all be a single > column > >>>> family. The key with tall or wide is based on the total byte size of > >> each > >>>> KeyValue. Your cells would need to be quite large for 50 to become a > >>>> problem. I still would recommend using a single CF though. > >>>> — > >>>> Sent from iPhone > >> > >
