Implementation of SUBSTRING for multi-byte character sets is inefficient 
-------------------------------------------------------------------------

                 Key: CORE-6542
                 URL: http://tracker.firebirdsql.org/browse/CORE-6542
             Project: Firebird Core
          Issue Type: Improvement
          Components: Engine
    Affects Versions: 4.0 RC 1, 3.0.7
            Reporter: Vlad Khorsun


The case below shows bad performance of SUBSTRING for UTF8 comparing with 
(legacy) UNICODE_FSS 

a) UNICODE_FSS

execute block
as
declare str1 varchar(8000) character set unicode_fss;
declare str2 varchar(10)   character set unicode_fss;
declare n int = 100000;
begin
  str1 = LPAD('abcd', 8000, '--');
  while (n > 0) do
  begin
    str2 = SUBSTRING(str1 from 1 FOR 10);
    n = n - 1;
  end
end

Execute time = 62ms


b) UTF8

execute block
as
declare str1 varchar(8000) character set utf8;
declare str2 varchar(10)   character set utf8;
declare n int = 100000;
begin
  str1 = LPAD('abcd', 8000, '--');
  while (n > 0) do
  begin
    str2 = SUBSTRING(str1 from 1 FOR 10);
    n = n - 1;
  end
end

Execute time = 983ms

The case is simplified and based on end-user report. In user case the same 
query on the system tables run much longer with FB4 than with FB3
(test database was restored from the same backup). Origin of the problem is 
that FB4 uses UTF8 for metadata while FB3 uses UNICODE_FSS.

The SUBSTRING implementation for UNICODE_FSS (internal_fss_substring()) is 
straigthforward and logical - it skips POSITION characters
from the start of the source string first and then copy LENGTH chars into dest 
string.

The UTF8 implementation (MultiByteCharSet::substring()) convert whole source 
string into UTF16 and only then get substring of UTF16 string. 
This is simple but very inefficient especially for a long strings and small 
POSITION values.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to