Good question. Method #2 works for now. Please watch HBASE-11608 which proposes to add synchronous split.
Cheers On Wed, Aug 20, 2014 at 1:45 PM, Shahab Yunus <[email protected]> wrote: > Thanks Ted. > > I also wanted know that from recommendation perspective is this approach > even safe or desirable or not. Or if this is some kind of HBase > anti-pattern (splitting a same table before each bulk import.) > > So I did try this and it works with and without existing data. > > Now one follow-up question. As we know the split method is asynchronous. So > how do we know that the split is happened as I don't see any callback > mechanism. > > In the POC I used 2 appraoches (just to see if it works or not) > > 1- sleeping for a while which is of course not workable. > 2- using the table.getRegionLocations method and checking the size of the > returned map, and if it gets greater from the value captured before the > split, I assume we are all set. > > #1 is of course not something that should be done. Is #2 makes sense? Or is > there some other better way to do this? > > Thanks again. > > Regards, > Shahab > > > On Tue, Aug 19, 2014 at 7:00 PM, Ted Yu <[email protected]> wrote: > > > My suggestion wasn't about pre-splitting. > > > > You can insert dummy values as part of your proof-of-concept code - > > before admin.split() > > is called. > > > > > > On Tue, Aug 19, 2014 at 3:50 PM, Shahab Yunus <[email protected]> > > wrote: > > > > > Ted, > > > > > > Hmmm. So basically, if I understand you correctly, what you are > proposing > > > is to insert dummy values corresponding to the new region boundaries > that > > > we want as a pre-processing step instead of presplitting? > > > > > > But inserting* only few rows* per-desired-region-range won't guarantee > a > > > new region, right? We have to then insert *enough *dummy rows to force > a > > > region split? Then there is the question of performing this and how > long > > > one should wait to have the region split actually occur? > > > > > > Or am I totally misunderstanding your question/suggestion? > > > > > > Right now the application does not expect empty values but I find the > > idea > > > interesting enough to add logic for it as long as it is workable. > > > > > > Regards, > > > Shahab > > > > > > > > > On Tue, Aug 19, 2014 at 4:49 PM, Ted Yu <[email protected]> wrote: > > > > > > > Shahab: > > > > How does your application deal with KeyValue whose value is empty ? > > > > > > > > Can you insert rows with empty value whose keys correspond to the > > splits > > > ? > > > > > > > > Cheers > > > > > > > > > > > > On Tue, Aug 19, 2014 at 1:29 PM, Shahab Yunus < > [email protected]> > > > > wrote: > > > > > > > > > So the situation here is that we are trying to bulk load data in > to a > > > > > table. But each load of data has such range of keys that it will go > > to > > > a > > > > > specific continuous chunk of the region servers. > > > > > > > > > > In other other words, at each bulk load, we face hot-spotting but > not > > > at > > > > > the end like the conventional case but it can be any where in > between > > > the > > > > > row-key range of our table. > > > > > > > > > > Please note that the split point that I am trying to split on does > > not > > > > > exist in the table yet. I am trying to prepare the existing table > > with > > > > > data, by splitting into regions into which I will then bulk import > my > > > new > > > > > data, to avoid hotspotting on one region server. > > > > > > > > > > The proof-of-concept code is below. Trying to split data into 16 > > > regions > > > > > ('0' to 'f' of the guid since each row in this current load shares > > the > > > > same > > > > > value for the first 2 fields of the row key). > > > > > > > > > > Key is: > > > > > data_source + time-in-long + 32-bytes-random-guid > > > > > > > > > > /*****/ > > > > > > > > > > byte[][] splits = new byte[16][]; > > > > > byte[] dataSourceId = Bytes.toBytes(dataSource.getDataSourceID()); > > > > > byte[] loadTime = Bytes.toBytes(batchLoadTime); > > > > > byte[] guidPrefix = null; > > > > > > > > > > for(int i=0; i<splitPointsPrefixes.length; i++) { > > > > > > > > > > guidPrefix = Bytes.toBytes(splitPointsPrefixes[i]); > > > > > splits[i] = new byte[dataSourceId.length + loadTime.length + > > > > guidPrefix. > > > > > length]; > > > > > ByteBuffer splitBuffer = ByteBuffer.wrap(splits[i]); > > > > > splitBuffer.put(dataSourceId); > > > > > splitBuffer.put(loadTime); > > > > > splitBuffer.put(guidPrefix); > > > > > } > > > > > > > > > > byte[] tableNameInBytes = Bytes.toBytes(tableName); > > > > > HBaseAdmin admin = new > > > HBaseAdmin(HBaseConfiguration.create(getConf())); > > > > > > > > > > for(byte[] split : splits) { > > > > > //This is asynchronous. Should I wait here after each split to > > move > > > > onto > > > > > next one? > > > > > admin.split(tableNameInBytes, split); > > > > > } > > > > > /*****/ > > > > > > > > > > Regards, > > > > > Shahab > > > > > > > > > > > > > > > On Tue, Aug 19, 2014 at 4:13 PM, Jean-Marc Spaggiari < > > > > > [email protected]> wrote: > > > > > > > > > > > Hi Shahab, > > > > > > > > > > > > can you sahre your code? Seems that the RS you reached did not > have > > > the > > > > > > expected region. How is your table status in the web interface? > > > > > > > > > > > > JM > > > > > > > > > > > > > > > > > > 2014-08-19 16:11 GMT-04:00 Shahab Yunus <[email protected] > >: > > > > > > > > > > > > > I have a table already created and with some data. I want to > > split > > > it > > > > > > > trough code using HBaseAdmin api into multiple regions, while > > > > > specifying > > > > > > > keys that do not exist in the table. > > > > > > > > > > > > > > I am getting the exception below which makes sense because the > > key > > > > > > doesn't > > > > > > > exist yet. But at the time of creation of the table we can > indeed > > > > > > pre-split > > > > > > > it using keys that don't exist. > > > > > > > > > > > > > > Is it possible to do it for table that already exists and has > > data? > > > > > > > > > > > > > > *Caused by: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException): > > > > > > > org.apache.hadoop.hbase.NotServingRegionException: * > > > > > > > > > > > > > > > > > > > > > Using Hbase: 0.98.1-cdh5.1.0 > > > > > > > > > > > > > > Thanks a lot. > > > > > > > > > > > > > > Regards, > > > > > > > Shahab > > > > > > > > > > > > > > > > > > > > > > > > > > > >
