Fairly small -- row keys 32-48 bytes, column keys about the same, and values 50-100 bytes (with a few outliers that probably go up to 1k).
On Mar 3, 2013, at 6:08 AM, Varun Sharma wrote: > What is the size of your writes ? > > On Sat, Mar 2, 2013 at 2:29 PM, Dan Crosta <[email protected]> wrote: > >> Hm. This could be part of the problem in our case. Unfortunately we don't >> have very good control over which rowkeys will come from which workers >> (we're not using map-reduce or anything like it where we have that sort of >> control, at least not without some changes). But this is valuable >> information for future developments, thanks for mentioning it. >> >> On Mar 2, 2013, at 2:56 PM, Asaf Mesika wrote: >> >>> Make sure you are not sending a lot of put of the same rowkey. This can >>> cause contention in the region server side. We fixed that in our project >> by >>> aggregating all the columns for the same rowkey into the same Put object >>> thus when sending List of Put we made sure each Put has a unique rowkey. >>> >>> On Saturday, March 2, 2013, Dan Crosta wrote: >>> >>>> On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote: >>>>> "That's only true from the HDFS perspective, right? Any given region is >>>>> "owned" by 1 of the 6 regionservers at any given time, and writes are >>>>> buffered to memory before being persisted to HDFS, right?" >>>>> >>>>> Only if you disabled the WAL, otherwise each change is written to the >>>> WAL first, and then committed to the memstore. >>>>> So in the sense it's even worse. Each edit is written twice to the FS, >>>> replicated 3 times, and all that only 6 data nodes. >>>> >>>> Are these writes synchronized somehow? Could there be a locking problem >>>> somewhere that wouldn't show up as utilization of disk or cpu? >>>> >>>> What is the upshot of disabling WAL -- I assume it means that if a >>>> RegionServer crashes, you lose any writes that it has in memory but not >>>> committed to HFiles? >>>> >>>> >>>>> 20k writes does seem a bit low. >>>> >>>> I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to >>>> about 22-23k writes per second, but still no apparent contention for >> any of >>>> the basic system resources. >>>> >>>> Any other suggestions on things to try? >>>> >>>> Thanks, >>>> - Dan >> >>
