Thanks everyone for the excellent ideas. Ryan - I kinda understand your suggestion to a point. If time permits, please explain further.
What you are suggesting is to create a table with 99 rows with keys 'c_1', 'c_2'... thru 'c_99'. Row c_1 would generate ids 1, 101, 201.. so on, and row c_99 would generate 99, 199, & so on. I got it this far. But hypothetically speaking, let's say I am running a MapReduce to process a huge log file. Each line of the log would be passed to a Map function. Trying to figure out how I would distribute load evenly amongst c_1 thru c_99. Please explain. On Sun, Feb 13, 2011 at 10:18 PM, Ryan Rawson <[email protected]> wrote: > you can also stripe, eg: > > c_1 starts at 1, skip=100 > c_2 starts at 2, skip=100 > c_$i starts at $i, skip=100 for 3..99 > > now you have 100x speed/parallelism. If single regionserver > assignment becomes a problem, use multiple tables. > > On Sun, Feb 13, 2011 at 10:12 PM, Lars George <[email protected]> > wrote: > > Hi SS, > > > > Some people that do not need strict contiguous IDs also use block > > increments of say 100. Each app server then gets 100 IDs to hand out > > and in case it dies it gets its next assigned 100 IDs and leaves a > > small gap behind. That way you can take the pressure of the counter if > > that is going to be an issue for you. Depends on your insert frequency > > obviously. > > > > Lars > > > > On Sun, Feb 13, 2011 at 7:10 PM, Something Something > > <[email protected]> wrote: > >> Hello, > >> > >> Can you please tell me if this is the proper way of designing a table > that's > >> got an auto increment key? If there's a better way please let me know > that > >> as well. > >> > >> After reading the mail archives, I learned that the best way is to use > the > >> 'incrementColumnValue' method of HTable. > >> > >> So hypothetically speaking let's say I have to create a "User -> Orders" > >> relationship. Every time user creates an order we will assign a system > >> generated (auto increment) id as primary key for the order. > >> > >> I am thinking I could do this: > >> > >> 1) Create a table of Ids for various objects such as "Order". It will > have > >> just a single row with key "1" and column families for various objects. > >> When it's time to add a new order for a user I can do something like > this: > >> > >> HTable tableIds = new HTable("IDs"); > >> Get get = new Get(Bytes.toBytes("1")); > >> Result result = tableIds.get(get); > >> long newOrderId = tableIds.incrementColumnValue(result.getRow(), > "orders", > >> "orderId", 1); > >> > >> // In future I could use the same table for other objects as follows > >> // long newInvoiceId = tableIds.incrementColumnValue(result.getRow(), > >> "invoices", "invoiceId", 1); > >> > >> 2) Once the newOrderId is retrieved I can add the info about order to > >> UserOrder table with a key of format: userId + "*" + newOrderId. The > >> "info" family of this table will have columns such as "orderAmount" , > >> "orderDate" etc. > >> > >> > >> As per the documentation, the 'incrementColumnValue' is done in > exclusive > >> and serial fashion for each row with a rowlock. In other words, even in > >> multi-threading environment we are guaranteed to get a unique key per > >> thread, correct? > >> > >> Is this a correct/good design for a table that needs auto increment key? > >> Please let me know. Thanks. > >> > > >
