Re: Splitting an existing table with new keys.

Shahab Yunus Tue, 19 Aug 2014 15:51:13 -0700

Ted,

Hmmm. So basically, if I understand you correctly, what you are proposing
is to insert dummy values corresponding to the new region boundaries that
we want as a pre-processing step instead of presplitting?


But inserting* only few rows* per-desired-region-range won't guarantee a
new region, right? We have to then insert *enough *dummy rows to force a
region split? Then there is the question of performing this and how long
one should wait to have the region split actually occur?

Or am I totally misunderstanding your question/suggestion?

Right now the application does not expect empty values but I find the idea
interesting enough to add logic for it as long as it is workable.

Regards,
Shahab


On Tue, Aug 19, 2014 at 4:49 PM, Ted Yu <[email protected]> wrote:

> Shahab:
> How does your application deal with KeyValue whose value is empty ?
>
> Can you insert rows with empty value whose keys correspond to the splits ?
>
> Cheers
>
>
> On Tue, Aug 19, 2014 at 1:29 PM, Shahab Yunus <[email protected]>
> wrote:
>
> > So the situation here is that we are trying to bulk load data in to a
> > table. But each load of data has such range of keys that it will go to a
> > specific continuous chunk of the region servers.
> >
> > In other other words, at each bulk load, we face hot-spotting but not at
> > the end like the conventional case but it can be any where in between the
> > row-key range of our table.
> >
> > Please note that the split point that I am trying to split on does not
> > exist in the table yet. I am trying to prepare the existing table with
> > data, by splitting into regions into which I will then bulk import my new
> > data, to avoid hotspotting on one region server.
> >
> > The proof-of-concept code is below. Trying to split data into 16 regions
> > ('0' to 'f' of the guid since each row in this current load shares the
> same
> > value for the first 2 fields of the row key).
> >
> > Key is:
> > data_source + time-in-long + 32-bytes-random-guid
> >
> > /*****/
> >
> > byte[][] splits = new byte[16][];
> > byte[] dataSourceId = Bytes.toBytes(dataSource.getDataSourceID());
> > byte[] loadTime = Bytes.toBytes(batchLoadTime);
> > byte[] guidPrefix = null;
> >
> >   for(int i=0; i<splitPointsPrefixes.length; i++)  {
> >
> >    guidPrefix = Bytes.toBytes(splitPointsPrefixes[i]);
> >    splits[i] = new byte[dataSourceId.length + loadTime.length +
> guidPrefix.
> > length];
> >    ByteBuffer splitBuffer = ByteBuffer.wrap(splits[i]);
> >    splitBuffer.put(dataSourceId);
> >    splitBuffer.put(loadTime);
> >    splitBuffer.put(guidPrefix);
> > }
> >
> > byte[] tableNameInBytes = Bytes.toBytes(tableName);
> > HBaseAdmin admin = new HBaseAdmin(HBaseConfiguration.create(getConf()));
> >
> > for(byte[] split : splits)  {
> >    //This is asynchronous. Should I wait here after each split to move
> onto
> > next one?
> >    admin.split(tableNameInBytes, split);
> > }
> > /*****/
> >
> > Regards,
> > Shahab
> >
> >
> > On Tue, Aug 19, 2014 at 4:13 PM, Jean-Marc Spaggiari <
> > [email protected]> wrote:
> >
> > > Hi Shahab,
> > >
> > > can you sahre your code? Seems that the RS you reached did not have the
> > > expected region. How is your table status in the web interface?
> > >
> > > JM
> > >
> > >
> > > 2014-08-19 16:11 GMT-04:00 Shahab Yunus <[email protected]>:
> > >
> > > > I have a table already created and with some data. I want to split it
> > > > trough code using HBaseAdmin api into multiple regions, while
> > specifying
> > > > keys that do not exist in the table.
> > > >
> > > > I am getting the exception below which makes sense because the key
> > > doesn't
> > > > exist yet. But at the time of creation of the table we can indeed
> > > pre-split
> > > > it using keys that don't exist.
> > > >
> > > > Is it possible to do it for table that already exists and has data?
> > > >
> > > > *Caused by:
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
> > > > org.apache.hadoop.hbase.NotServingRegionException: *
> > > >
> > > >
> > > > Using Hbase: 0.98.1-cdh5.1.0
> > > >
> > > > Thanks a lot.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > >
> >
>

Re: Splitting an existing table with new keys.

Reply via email to