Re: Speedup initial index creation

Kadir Ozdemir Thu, 01 Apr 2021 21:08:41 -0700

If you have only one version for the rows, then you should not see much
difference between old and new.  Can you provide some info on your
performance testing? For example, number of rows in your data table, number
of indexes, number of data table regions, number of MapReduce mappers,
IndexTool MapReduce job duration, number of data nodes in your cluster, any
IndexTool config params setting, etc.


On Thu, Apr 1, 2021 at 6:12 PM Alexander Batyrshin <0x62...@gmail.com>
wrote:

>
>
> 2 апр. 2021 г., в 03:55, Kadir Ozdemir <ka...@apache.org> написал(а):
>
> 
> 1) I was thinking about the bulk load tool (
> https://phoenix.apache.org/bulk_dataload.html). However, in this case,
> you are not interested in bulk loading into the data table and its index
> but just the index table. Now, I see that it would not work for you. You
> are supposed to build a strongly consistent index once when you create the
> index. I am curious why you are so concerned about its performance.
>
>
> I need minimum maintenance time window on our cluster.
>
> 2)  I thought you wanted to disable WAL only during index rebuild for the
> index table, not all the time. You should be able to still use the ALTER
> TABLE command with the new index design. Please note that in this case you
> would disable WAL for the main table too.  Is that what you are looking
> for? If you are willing to disable WAL, then there is no point in using
> strongly consistent indexes because you would lose recently written data if
> region servers crash. By the way, you can use IndexUpgradeTool to downgrade
> your tables to the old design (to replace IndexRegionObserver with
> Indexer), see https://phoenix.apache.org/secondary_indexing.html
>
>
> I know about possibility of data loosing. But it’s not a problem if main
> table do not receive mutation during index creation (maintenance window).
>
> Old indexes goes inconsistent too often, so it not the way.
>
>
> 3) Delete markers will be added each time you run the index create command
> whenever the data table rows have multiple versions and the versions of a
> row have different values for indexed columns.
>
>
> My table has 1 version per row after major-compaction. Also main table has
> no mutation during index creation
>
> On Thu, Apr 1, 2021 at 3:28 PM Alexander Batyrshin <0x62...@gmail.com>
> wrote:
>
>> 1) How to create index old way via intermediate HFiles?
>>
>> I see “direct” option for IndexTool but description says its disabled:
>>
>> private static final Option DIRECT_API_OPTION = new Option("direct",
>> "direct", false,
>>     "This parameter is deprecated. Direct mode will be used whether it is
>> set or not. Keeping it for backwards compatibility.”);
>>
>>
>> 2) On phoenix-4.14.2 (old indexes) WAL disabling for index table was
>> possible by “ALTER TABLE main_table SET DISABLE_WAL=true”
>> Maybe we can add this feature to 4.16+ ?
>>
>>
>> 3) My main table has VERSIONS=>1. Anyway I decided to major-compacted
>> before next run and still got Delete mutations
>>
>> From table metrics ~ 10% of mutations is Delete
>> <PastedGraphic-1.png>
>>
>> I checked my main table, it has loaded IndexRegionObserver:
>>
>> coprocessor$1 =>
>> '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|',
>> coprocessor$2 =>
>> '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|',
>> coprocessor$3 =>
>> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|',
>> coprocessor$4 =>
>> '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|',
>> coprocessor$5 =>
>> '|org.apache.phoenix.hbase.index.IndexRegionObserver|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'
>>
>>
>> By the way I split index table for more regions, increased
>> hbase.hregion.memstore.flush.size, hbase.hstore.blockingStoreFiles and get
>> ~ 30% speedup.
>> This is still very slow compared to old index creation.
>>
>> On 31 Mar 2021, at 02:55, Kadir Ozdemir <ka...@gsuite.cloud.apache.org>
>> wrote:
>>
>> I assume that your base table has several versions for a given row. If
>> so, creating a consistent index on this base table can be slower
>> than creating an old design index. This is because the new design creates
>> an index row for every data table row version.  It simply replays the
>> mutations on a row without updating the data table but makes necessary
>> mutations on the index table. It does this to make sure that if you use SCN
>> connections to do point-in-time queries, the index will return correct
>> results. During these replays, index rows will be deleted if index columns
>> are modified. This is the reason I think you see delete mutations on the
>> index table.
>>
>> 1) Yes
>> 2) No
>> 3) No
>>
>> It will be a good improvement to have an option to support (3) by just
>> creating indexes using the last data row versions. Please feel free to
>> create an improvement Jira for this.
>>
>> Did you create your base table using 4.16? If not, have you upgraded it
>> to the new index design using IndexUpgradeTool? I am asking this to make
>> sure that your index actually uses the new index design. You can verify
>> this using the HBase shell by describing the data table and checking if the
>> IndexRegionObserver coproc is loaded on your  base table.
>>
>>
>> On Tue, Mar 30, 2021 at 3:10 PM Alexander Batyrshin <0x62...@gmail.com>
>> wrote:
>>
>>> I tried on phoenix-4.16.0
>>>
>>> > On 31 Mar 2021, at 00:54, Alexander Batyrshin <0x62...@gmail.com>
>>> wrote:
>>> >
>>> > Hello,
>>> > I tried to create new consistent index on mutable table and found out
>>> that IndexTool MapReduce works 3-5 times slower compared to old indexes on
>>> 4.14.2
>>> > So I have some question;
>>> >
>>> > 1) Is it possible to create index old way via intermediate HFiles and
>>> bulk-loading?
>>> > 2) Is it possible to disable WAL on HBase index table for creation
>>> time?
>>> > 3) My main table has no updates, but I observe Delete mutations on
>>> index table. Is it possible to disable this for initial index creation time?
>>> >
>>>
>>>
>>

Re: Speedup initial index creation

Reply via email to