Re: Provide an option to infinite retry when updating index failed

2017-08-09 Thread Sergey Soldatov
Personally I don't like the idea. If one of the index regions get stuck (in
transition for example), the whole cluster will be down after a while. It's
better to get this index disabled rather than having cluster not working
(until someone noticed that the ingestion rate dropped to zero)  and down
time to restart the cluster (which may be quite expensive for several
hundreds nodes). For healthy HBase cluster most of the problems with
indexes are related to RPC configuration and HBase fixes in this area:
   a. Previously controller factory and scheduler have to be configured
correctly. For example clients should not use controller factory
(PHOENIX-3994 and related). Otherwise table updates will go to IndexRPC and
that may saturate all index handlers and cause distributed dead lock. Also
number of HBase fixes are required to make handlers priority working as
expected: HBASE-15295, HBASE-15177
   b. With limited number of Index handlers the queue for regular RPC may
grow continuously and  it may get stuck if HBase doesn't have HBASE-15146.

Also I need to mention that some hardware/ OS limitations may cause
problems with Indexes as well. If OOM like "unable to fork" during index
region update that would lead to distributed dead lock as well because the
AsyncProcess that is waiting for the  index update completion will wait
forever, and the thread that started that process will keep a lock on the
region, blocking the rest work. Ideally we need to kill HBase on OOM and
usually it's configured in that way, but since JVM is unable to fork, it's
unable to execute kill as well.

The issues with the replaying the WAL during startup is usually happen
because of an another distributed dead lock, when all open region threads
are trying to open data table regions, but index regions are offline yet.
Can be handled by increasing the number of threads for open regions as well
as applying HBASE-16095.

Those are supposed to cover most of the problem that Phoenix may experience
with unrecoverable Index failures (in most cases they lead to the whole
cluster problem even without disabling index) in heavy write scenario.
There are some other issues with creating indexes (upsert select stuff),
but they are covered by separate jiras as well.

Thanks,
Sergey

On Wed, Aug 9, 2017 at 11:04 AM, James Taylor 
wrote:

> We've been doing a ton of work to stabilize mutable secondary indexing
> recently (see recently resolved JIRAs). This will appear in 4.12.0 or
> 4.11.1. Also, a lot of work was done for local indexing in 4.11. We're
> still testing these at scale, so there may be more to come in 4.12 for
> local indexing too.
>
> One other potentially viable option if you're ok with eventual consistency
> is to leave the index active even when data table writes fail (and let the
> partial index rebuilder catch it up for you). We likely need PHOENIX-3949
> to make that option viable.
>
> So my thinking would be to wait until 4.12 and to check out local indexing
> too.
>
> Thanks,
> James
>
> On Tue, Aug 8, 2017 at 8:09 PM, William  wrote:
>
> > Hi all,
> > To maintain consistency between data table and its index tables, we have
> > to do a transactional update cross regions on different region servers.
> For
> > non-transactional table, we cannot guarantee this consistency for mutable
> > global secondary index. Here are the problems of existing solutions:
> > 1. disable index write
> >   a) update system.catalog to change index status, and set timestamp, may
> > lead to chain failures
> >   b) partially rebuild index may not be a good solution for production
> > env, because:
> >  b1) may execute for a long time for large table (several TBs)
> >  b2) there might be only a few inconsistent data which needs to be
> > caught up but we have to do a full table time-ranged scan over the data
> > table
> >  b3) if there are deletes/updates and a major compaction took place,
> > it'll leave dirty data in index tables
> >   c) selects that hits the disabled index will degenerate to full table
> > scan against data table which may quickly exhausts the read ability of
> the
> > whole cluster
> >
> >
> > 2. disable data table write
> >   a) selects that hits index still works
> >   b) actually data table write is not disabled, but raise an exception.
> > So  still needs to rebuild index tables when index regions are back
> online,
> > which has the same issues in 1.b
> >   c) as index rebuild is needed, system.catalog still needs to be
> updated,
> > so chain failure may still happen.
> >
> >
> > What should be guaranteed:
> > 1. absolutely no chain failure
> > 2. absolutely no inconsistency no matter what happened
> > 3. selects that hit the index will not degenerate
> >
> >
> > New solution:
> > 1. When update index failed, retry forever until succeed
> > 2. Do the same retry when replaying WAL
> > 3. No need to update catalog table to avoid potential chain failures
> > 4. This index failure 

Re: Provide an option to infinite retry when updating index failed

2017-08-09 Thread James Taylor
We've been doing a ton of work to stabilize mutable secondary indexing
recently (see recently resolved JIRAs). This will appear in 4.12.0 or
4.11.1. Also, a lot of work was done for local indexing in 4.11. We're
still testing these at scale, so there may be more to come in 4.12 for
local indexing too.

One other potentially viable option if you're ok with eventual consistency
is to leave the index active even when data table writes fail (and let the
partial index rebuilder catch it up for you). We likely need PHOENIX-3949
to make that option viable.

So my thinking would be to wait until 4.12 and to check out local indexing
too.

Thanks,
James

On Tue, Aug 8, 2017 at 8:09 PM, William  wrote:

> Hi all,
> To maintain consistency between data table and its index tables, we have
> to do a transactional update cross regions on different region servers. For
> non-transactional table, we cannot guarantee this consistency for mutable
> global secondary index. Here are the problems of existing solutions:
> 1. disable index write
>   a) update system.catalog to change index status, and set timestamp, may
> lead to chain failures
>   b) partially rebuild index may not be a good solution for production
> env, because:
>  b1) may execute for a long time for large table (several TBs)
>  b2) there might be only a few inconsistent data which needs to be
> caught up but we have to do a full table time-ranged scan over the data
> table
>  b3) if there are deletes/updates and a major compaction took place,
> it'll leave dirty data in index tables
>   c) selects that hits the disabled index will degenerate to full table
> scan against data table which may quickly exhausts the read ability of the
> whole cluster
>
>
> 2. disable data table write
>   a) selects that hits index still works
>   b) actually data table write is not disabled, but raise an exception.
> So  still needs to rebuild index tables when index regions are back online,
> which has the same issues in 1.b
>   c) as index rebuild is needed, system.catalog still needs to be updated,
> so chain failure may still happen.
>
>
> What should be guaranteed:
> 1. absolutely no chain failure
> 2. absolutely no inconsistency no matter what happened
> 3. selects that hit the index will not degenerate
>
>
> New solution:
> 1. When update index failed, retry forever until succeed
> 2. Do the same retry when replaying WAL
> 3. No need to update catalog table to avoid potential chain failures
> 4. This index failure policy is an option that can be switched on/off
>
>
> About this solution:
> 1. Simple
> 2. When update index failed, we give up the write ability to maintain
> consistency and read ability. This is acceptable for mutable global index
> as its read ability is more important.
> 3. No need to rebuild index afterwards, as long as the pending retries
> complete, indexes will be in sync.
> 4. In worst case, some or all of the RS will not be able to write.
> 5. We cannot handle index updates failure elegantly because we are not
> doing real transactions. So this solution is a simple but effective way to
> achieve consistency without transactions, though there is a price.
>
>
> What does everybody think?
>
>
> Thanks
> William


Provide an option to infinite retry when updating index failed

2017-08-08 Thread William
Hi all,
To maintain consistency between data table and its index tables, we have to do 
a transactional update cross regions on different region servers. For 
non-transactional table, we cannot guarantee this consistency for mutable 
global secondary index. Here are the problems of existing solutions:
1. disable index write
  a) update system.catalog to change index status, and set timestamp, may lead 
to chain failures
  b) partially rebuild index may not be a good solution for production env, 
because:
 b1) may execute for a long time for large table (several TBs)
 b2) there might be only a few inconsistent data which needs to be caught 
up but we have to do a full table time-ranged scan over the data table
 b3) if there are deletes/updates and a major compaction took place, it'll 
leave dirty data in index tables
  c) selects that hits the disabled index will degenerate to full table scan 
against data table which may quickly exhausts the read ability of the whole 
cluster


2. disable data table write
  a) selects that hits index still works
  b) actually data table write is not disabled, but raise an exception. So  
still needs to rebuild index tables when index regions are back online, which 
has the same issues in 1.b
  c) as index rebuild is needed, system.catalog still needs to be updated, so 
chain failure may still happen.


What should be guaranteed:
1. absolutely no chain failure
2. absolutely no inconsistency no matter what happened
3. selects that hit the index will not degenerate


New solution:
1. When update index failed, retry forever until succeed
2. Do the same retry when replaying WAL
3. No need to update catalog table to avoid potential chain failures
4. This index failure policy is an option that can be switched on/off


About this solution:
1. Simple
2. When update index failed, we give up the write ability to maintain 
consistency and read ability. This is acceptable for mutable global index as 
its read ability is more important.
3. No need to rebuild index afterwards, as long as the pending retries 
complete, indexes will be in sync.
4. In worst case, some or all of the RS will not be able to write.
5. We cannot handle index updates failure elegantly because we are not doing 
real transactions. So this solution is a simple but effective way to achieve 
consistency without transactions, though there is a price.


What does everybody think?


Thanks
William