Thanks ! few more doubts : 1.Say if requirement is to count distinct value of F1-
If field is part of key- is hbase can't just scan key and skip value deserialsation and return result to client which will calculate distinct and in second approcah Hbase will desrialise the value of return column containing F1 to cleint which will calculate the distinct. 2.For bulk load when LoadIncrementalHFiles runs and regionserver moves the hfiles from hdfs to region directory - does regionserver localise the hfile by downloading it to local and then uploading again in region directory? Or it just moves to to region directory and wait for next compaction to get it localise as in regionserver failure case? On Mon, Aug 17, 2015 at 11:00 PM, Ted Yu <[email protected]> wrote: > For both scenarios you mentioned, field is not leading part of row key. > You would need to specify timerange or start row / stop row to narrow the > key range being scanned. > > I am leaning toward using second approach. > > Cheers > > On Mon, Aug 17, 2015 at 9:41 AM, Shushant Arora <[email protected] > > > wrote: > > > ~8-10 fields of size (5 of 20 bytes each )and 3 fields of size 200 bytes > > each. > > > > On Mon, Aug 17, 2015 at 9:55 PM, Ted Yu <[email protected]> wrote: > > > > > How many fields such as F1 are you considering for embedding in row > key ? > > > > > > Suggested reading: > > > http://hbase.apache.org/book.html#rowkey.design > > > http://hbase.apache.org/book.html#client.filter.kvm (see > > > ColumnPrefixFilter) > > > > > > Cheers > > > > > > On Mon, Aug 17, 2015 at 8:13 AM, Shushant Arora < > > [email protected] > > > > > > > wrote: > > > > > > > 1.so size limit is per cell's identifier + value ? > > > > > > > > What is more optimise - to have field in key or in column family's > > > column ? > > > > If pattern is like every row has that field. > > > > > > > > Say I have a field F1 in all rows so > > > > Situtatio -1 > > > > key1#F1(as composite key) - and rest fields in column > > > > > > > > Situation-2 > > > > key1 as key and F1 part of column family. > > > > > > > > > > > > This is the main reason I asked the key size limit. > > > > If I asked for no of rows where F1 is = 'someval' will it be faster > in > > > > situation-1 than in situation-2. Since in 1 it can return the result > > just > > > > by traversing keys no need to read columns? > > > > > > > > > > > > On Mon, Aug 17, 2015 at 8:27 PM, Ted Yu <[email protected]> wrote: > > > > > > > > > For #1, it is the limit on a single keyvalue, not row, not key. > > > > > > > > > > For #2, please see the following: > > > > > > > > > > http://hbase.apache.org/book.html#store.memstore > > > > > > > > > http://hbase.apache.org/book.html#regionserver_splitting_implementation > > > > > > > > > > Cheers > > > > > > > > > > On Mon, Aug 17, 2015 at 7:36 AM, Shushant Arora < > > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > 1.Is hbase.client.keyvalue.maxsize is max size of row or key > only > > ? > > > Is > > > > > > there any limit on key size only ? > > > > > > 2.Access pattern is mostly on key based only- Is memstores and > > > regions > > > > > on a > > > > > > regionserver are per table basis? Is it if I have multiple tables > > it > > > > will > > > > > > have multiple memstores instead of few if it would have been one > > > large > > > > > > table ? > > > > > > > > > > > > > > > > > > On Mon, Aug 17, 2015 at 7:29 PM, Ted Yu <[email protected]> > > wrote: > > > > > > > > > > > > > For #1, take a look at the following in hbase-default.xml : > > > > > > > > > > > > > > <name>hbase.client.keyvalue.maxsize</name> > > > > > > > <value>10485760</value> > > > > > > > > > > > > > > For #2, it would be easier to answer if you can outline access > > > > patterns > > > > > > in > > > > > > > your app. > > > > > > > > > > > > > > For #3, adjustment according to current region boundaries is > done > > > > > client > > > > > > > side. Take a look at the javadoc for LoadQueueItem > > > > > > > in LoadIncrementalHFiles.java > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora < > > > > > > [email protected] > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > 1.Is there any max limit on key size of hbase table. > > > > > > > > 2.Is multiple small tables vs one large table which one is > > > > preferred. > > > > > > > > 3.for bulk load -when LoadIncremantalHfile is run it again > > > > > > recalculates > > > > > > > > the region splits based on region boundary - is this division > > > > happens > > > > > > on > > > > > > > > client side or server side again at region server or hbase > > master > > > > and > > > > > > > then > > > > > > > > it assigns the splits which cross target region boundary to > > > desired > > > > > > > > regionserver. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
