[jira] [Commented] (TRAFODION-2576) Incremental UPDATE STATS fails on long varchar values
[ https://issues.apache.org/jira/browse/TRAFODION-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959950#comment-15959950 ] ASF GitHub Bot commented on TRAFODION-2576: --- Github user asfgit closed the pull request at: https://github.com/apache/incubator-trafodion/pull/1046 > Incremental UPDATE STATS fails on long varchar values > - > > Key: TRAFODION-2576 > URL: https://issues.apache.org/jira/browse/TRAFODION-2576 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Environment: All >Reporter: David Wayne Birdsall >Assignee: David Wayne Birdsall > > The following test script demonstrates the problem: > -- This script creates a table with varchar values longer than 256 characters. > -- It tnen demonstrates issues with incremental stats on such values. > ?section setup > drop table if exists testbigcol; > create table testbigcol > ( a int not null, > b varchar(500), > primary key (a) ) > salt using 4 partitions; > insert into testbigcol values (1,'axq'),(2,'bxw'); > insert into testbigcol select x.a + 2 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 6 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 42 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > -- 0 1 2 3 4 > 5 > -- > 012345678901234567890123456789012345678901234567890123 > update testbigcol set b = b || ' and now for something completely different > 0123456789' > where mod(a,2) = 1; > update testbigcol set b = b || ' that was fun, and now for something > completely different 0123456789' > where mod(a,3) = 1; > update testbigcol set b = b || ' and then there was a great hue and cry and > now for something completely different 0123456789' > where mod(a,5) = 1; > update testbigcol set b = b || ' and we can do all sorts of things to make > these strings longer, and now for something completely different 0123456789' > where mod(a,7) = 1; > ?section testit > update statistics for table testbigcol on every column sample random 100 > percent persistent; > showstats for table testbigcol on existing columns detail; > update testbigcol set b = 'c' || b where a > 1700; > update statistics for table testbigcol on existing columns incremental where > a > 1700; > showstats for table testbigcol on existing columns detail; > When run, the last UPDATE STATISTICS command fails as follows: > >>update statistics for table testbigcol on existing columns incremental > >>where a > 1700; > *** ERROR[9200] UPDATE STATISTICS for table TRAFODION.SEABASE.TESTBIGCOL > encountered an error (8402) from statement IUS data set I creation. > *** ERROR[8402] A string overflow occurred during the evaluation of a > character expression. Conversion of Source Type:VARCHAR(REC_BYTE_V_ASCII,290 > BYTES,ISO88591) Source Value:caxqbxwbxwaxqaxqbxwbxwbxw and now for something > completely different 0123456789 and then there was a great hue and cry and > now for something completely different 0123456789 and we can do all sorts of > to Target Type:VARCHAR(REC_BYTE_V_ASCII,256 BYTES,ISO88591). > --- SQL operation failed with errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TRAFODION-2576) Incremental UPDATE STATS fails on long varchar values
[ https://issues.apache.org/jira/browse/TRAFODION-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959182#comment-15959182 ] ASF GitHub Bot commented on TRAFODION-2576: --- Github user DaveBirdsall commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/1046#discussion_r110201855 --- Diff: core/sql/ustat/hs_globals.cpp --- @@ -1882,6 +1882,55 @@ NABoolean HSColumnStruct::operator==(const HSColumnStruct& other) const return ( colnum == other.colnum ); } +// +// METHOD: addTruncatedColumnReference() +// +// PURPOSE: Generates a column reference or a SUBSTRING +// on a column reference which truncates the +// column to the maximum length allowed in +// UPDATE STATISTICS. +// +// INPUT: 'qry' - the SQL query string to append the +// reference to. +// 'colInfo' - struct containing datatype info +// about the column. +// +void HSColumnStruct::addTruncatedColumnReference(NAString & qry) + { +HSGlobalsClass *hs_globals = GetHSContext(); +Lng32 maxLengthInBytes = hs_globals->maxCharColumnLengthInBytes; +bool isOverSized = DFS2REC::isAnyCharacter(datatype) && + (length > maxLengthInBytes); +if (isOverSized) + { +// Note: The result data type of SUBSTRING is VARCHAR, always. +// But if the column is CHAR, many places in the ustat code are not +// expecting a VARCHAR. So, we stick a CAST around it to convert +// it back to a CHAR in these cases. + +NABoolean isFixedChar = DFS2REC::isSQLFixedChar(datatype); +if (isFixedChar) + qry += "CAST("; +qry += "SUBSTRING("; +qry += externalColumnName->data(); +qry += " FOR "; + +char temp[20]; // big enough for "nn)" +sprintf(temp,"%d)", maxLengthInBytes / CharInfo::maxBytesPerChar(charset)); +qry += temp; +if (isFixedChar) + { +qry += " AS CHAR("; +qry += temp; +qry += ")"; --- End diff -- Thanks for these comments. Your analysis is correct. The code is limiting strings to 256 bytes, which for UTF-8 means 64 characters since the longest UTF-8 character is 4 bytes long. By the way, the 256 byte limit is controllable by CQD so a customer or user can increase this if they wish. The CQD is USTAT_MAX_CHAR_COL_LENGTH_IN_BYTES. > Incremental UPDATE STATS fails on long varchar values > - > > Key: TRAFODION-2576 > URL: https://issues.apache.org/jira/browse/TRAFODION-2576 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Environment: All >Reporter: David Wayne Birdsall >Assignee: David Wayne Birdsall > > The following test script demonstrates the problem: > -- This script creates a table with varchar values longer than 256 characters. > -- It tnen demonstrates issues with incremental stats on such values. > ?section setup > drop table if exists testbigcol; > create table testbigcol > ( a int not null, > b varchar(500), > primary key (a) ) > salt using 4 partitions; > insert into testbigcol values (1,'axq'),(2,'bxw'); > insert into testbigcol select x.a + 2 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 6 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 42 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > -- 0 1 2 3 4 > 5 > -- > 012345678901234567890123456789012345678901234567890123 > update testbigcol set b = b || ' and now for something completely different > 0123456789' > where mod(a,2) = 1; > update testbigcol set b = b || ' that was fun, and now for something > completely different 0123456789' > where mod(a,3) = 1; > update testbigcol set b = b || ' and then there was a great hue and cry and > now for something completely different 0123456789' > where mod(a,5) = 1; > update testbigcol set b = b || ' and we can do all sorts of things to make > these strings longer, and now for something completely different 0123456789' > where mod(a,7) = 1; > ?section testit > update statistics for table testbigcol on every column sample random 100 > percent persistent; > showstats for table testbigcol on existing columns detail; > update testbigcol set b = 'c' || b where a > 1700; > update statistics for table testbigcol on existing columns incremental where > a > 1700; > showstats
[jira] [Commented] (TRAFODION-2576) Incremental UPDATE STATS fails on long varchar values
[ https://issues.apache.org/jira/browse/TRAFODION-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958120#comment-15958120 ] ASF GitHub Bot commented on TRAFODION-2576: --- Github user zellerh commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/1046#discussion_r110060012 --- Diff: core/sql/ustat/hs_globals.cpp --- @@ -1882,6 +1882,55 @@ NABoolean HSColumnStruct::operator==(const HSColumnStruct& other) const return ( colnum == other.colnum ); } +// +// METHOD: addTruncatedColumnReference() +// +// PURPOSE: Generates a column reference or a SUBSTRING +// on a column reference which truncates the +// column to the maximum length allowed in +// UPDATE STATISTICS. +// +// INPUT: 'qry' - the SQL query string to append the +// reference to. +// 'colInfo' - struct containing datatype info +// about the column. +// +void HSColumnStruct::addTruncatedColumnReference(NAString & qry) + { +HSGlobalsClass *hs_globals = GetHSContext(); +Lng32 maxLengthInBytes = hs_globals->maxCharColumnLengthInBytes; +bool isOverSized = DFS2REC::isAnyCharacter(datatype) && + (length > maxLengthInBytes); +if (isOverSized) + { +// Note: The result data type of SUBSTRING is VARCHAR, always. +// But if the column is CHAR, many places in the ustat code are not +// expecting a VARCHAR. So, we stick a CAST around it to convert +// it back to a CHAR in these cases. + +NABoolean isFixedChar = DFS2REC::isSQLFixedChar(datatype); +if (isFixedChar) + qry += "CAST("; +qry += "SUBSTRING("; +qry += externalColumnName->data(); +qry += " FOR "; + +char temp[20]; // big enough for "nn)" +sprintf(temp,"%d)", maxLengthInBytes / CharInfo::maxBytesPerChar(charset)); +qry += temp; +if (isFixedChar) + { +qry += " AS CHAR("; +qry += temp; +qry += ")"; --- End diff -- ... thinking about this again, we don't have a SUBSTRING method that will give the first n _bytes_ of a UTF-8 string, truncated to the next UTF-8 character. So, without such a function we can't really do what I suggested above. If we have a long UTF-8 column c and maxLengthInBytes is 256, then we will take SUBSTRING(c for 64) here. In many situations and languages, this will be about 64 bytes of information, not 256. > Incremental UPDATE STATS fails on long varchar values > - > > Key: TRAFODION-2576 > URL: https://issues.apache.org/jira/browse/TRAFODION-2576 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Environment: All >Reporter: David Wayne Birdsall >Assignee: David Wayne Birdsall > > The following test script demonstrates the problem: > -- This script creates a table with varchar values longer than 256 characters. > -- It tnen demonstrates issues with incremental stats on such values. > ?section setup > drop table if exists testbigcol; > create table testbigcol > ( a int not null, > b varchar(500), > primary key (a) ) > salt using 4 partitions; > insert into testbigcol values (1,'axq'),(2,'bxw'); > insert into testbigcol select x.a + 2 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 6 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 42 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > -- 0 1 2 3 4 > 5 > -- > 012345678901234567890123456789012345678901234567890123 > update testbigcol set b = b || ' and now for something completely different > 0123456789' > where mod(a,2) = 1; > update testbigcol set b = b || ' that was fun, and now for something > completely different 0123456789' > where mod(a,3) = 1; > update testbigcol set b = b || ' and then there was a great hue and cry and > now for something completely different 0123456789' > where mod(a,5) = 1; > update testbigcol set b = b || ' and we can do all sorts of things to make > these strings longer, and now for something completely different 0123456789' > where mod(a,7) = 1; > ?section testit > update statistics for table testbigcol on every column sample random 100 > percent persistent; > showstats for table testbigcol on existing columns detail; > update testbigcol set b = 'c' || b where a > 1700; > update statistics for
[jira] [Commented] (TRAFODION-2576) Incremental UPDATE STATS fails on long varchar values
[ https://issues.apache.org/jira/browse/TRAFODION-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958105#comment-15958105 ] ASF GitHub Bot commented on TRAFODION-2576: --- Github user zellerh commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/1046#discussion_r110058006 --- Diff: core/sql/ustat/hs_globals.cpp --- @@ -1882,6 +1882,55 @@ NABoolean HSColumnStruct::operator==(const HSColumnStruct& other) const return ( colnum == other.colnum ); } +// +// METHOD: addTruncatedColumnReference() +// +// PURPOSE: Generates a column reference or a SUBSTRING +// on a column reference which truncates the +// column to the maximum length allowed in +// UPDATE STATISTICS. +// +// INPUT: 'qry' - the SQL query string to append the +// reference to. +// 'colInfo' - struct containing datatype info +// about the column. +// +void HSColumnStruct::addTruncatedColumnReference(NAString & qry) + { +HSGlobalsClass *hs_globals = GetHSContext(); +Lng32 maxLengthInBytes = hs_globals->maxCharColumnLengthInBytes; +bool isOverSized = DFS2REC::isAnyCharacter(datatype) && + (length > maxLengthInBytes); +if (isOverSized) + { +// Note: The result data type of SUBSTRING is VARCHAR, always. +// But if the column is CHAR, many places in the ustat code are not +// expecting a VARCHAR. So, we stick a CAST around it to convert +// it back to a CHAR in these cases. + +NABoolean isFixedChar = DFS2REC::isSQLFixedChar(datatype); +if (isFixedChar) + qry += "CAST("; +qry += "SUBSTRING("; +qry += externalColumnName->data(); +qry += " FOR "; + +char temp[20]; // big enough for "nn)" +sprintf(temp,"%d)", maxLengthInBytes / CharInfo::maxBytesPerChar(charset)); +qry += temp; +if (isFixedChar) + { +qry += " AS CHAR("; +qry += temp; +qry += ")"; --- End diff -- ... or, actually, make it universally BYTES when the charset is UTF-8. That probably makes more sense, as the information content is then roughly the same. > Incremental UPDATE STATS fails on long varchar values > - > > Key: TRAFODION-2576 > URL: https://issues.apache.org/jira/browse/TRAFODION-2576 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Environment: All >Reporter: David Wayne Birdsall >Assignee: David Wayne Birdsall > > The following test script demonstrates the problem: > -- This script creates a table with varchar values longer than 256 characters. > -- It tnen demonstrates issues with incremental stats on such values. > ?section setup > drop table if exists testbigcol; > create table testbigcol > ( a int not null, > b varchar(500), > primary key (a) ) > salt using 4 partitions; > insert into testbigcol values (1,'axq'),(2,'bxw'); > insert into testbigcol select x.a + 2 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 6 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 42 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > -- 0 1 2 3 4 > 5 > -- > 012345678901234567890123456789012345678901234567890123 > update testbigcol set b = b || ' and now for something completely different > 0123456789' > where mod(a,2) = 1; > update testbigcol set b = b || ' that was fun, and now for something > completely different 0123456789' > where mod(a,3) = 1; > update testbigcol set b = b || ' and then there was a great hue and cry and > now for something completely different 0123456789' > where mod(a,5) = 1; > update testbigcol set b = b || ' and we can do all sorts of things to make > these strings longer, and now for something completely different 0123456789' > where mod(a,7) = 1; > ?section testit > update statistics for table testbigcol on every column sample random 100 > percent persistent; > showstats for table testbigcol on existing columns detail; > update testbigcol set b = 'c' || b where a > 1700; > update statistics for table testbigcol on existing columns incremental where > a > 1700; > showstats for table testbigcol on existing columns detail; > When run, the last UPDATE STATISTICS command fails as follows: > >>update statistics for table testbigcol on existing columns incremental >
[jira] [Commented] (TRAFODION-2576) Incremental UPDATE STATS fails on long varchar values
[ https://issues.apache.org/jira/browse/TRAFODION-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958050#comment-15958050 ] ASF GitHub Bot commented on TRAFODION-2576: --- Github user zellerh commented on a diff in the pull request: https://github.com/apache/incubator-trafodion/pull/1046#discussion_r110052947 --- Diff: core/sql/ustat/hs_globals.cpp --- @@ -1882,6 +1882,55 @@ NABoolean HSColumnStruct::operator==(const HSColumnStruct& other) const return ( colnum == other.colnum ); } +// +// METHOD: addTruncatedColumnReference() +// +// PURPOSE: Generates a column reference or a SUBSTRING +// on a column reference which truncates the +// column to the maximum length allowed in +// UPDATE STATISTICS. +// +// INPUT: 'qry' - the SQL query string to append the +// reference to. +// 'colInfo' - struct containing datatype info +// about the column. +// +void HSColumnStruct::addTruncatedColumnReference(NAString & qry) + { +HSGlobalsClass *hs_globals = GetHSContext(); +Lng32 maxLengthInBytes = hs_globals->maxCharColumnLengthInBytes; +bool isOverSized = DFS2REC::isAnyCharacter(datatype) && + (length > maxLengthInBytes); +if (isOverSized) + { +// Note: The result data type of SUBSTRING is VARCHAR, always. +// But if the column is CHAR, many places in the ustat code are not +// expecting a VARCHAR. So, we stick a CAST around it to convert +// it back to a CHAR in these cases. + +NABoolean isFixedChar = DFS2REC::isSQLFixedChar(datatype); +if (isFixedChar) + qry += "CAST("; +qry += "SUBSTRING("; +qry += externalColumnName->data(); +qry += " FOR "; + +char temp[20]; // big enough for "nn)" +sprintf(temp,"%d)", maxLengthInBytes / CharInfo::maxBytesPerChar(charset)); +qry += temp; +if (isFixedChar) + { +qry += " AS CHAR("; +qry += temp; +qry += ")"; --- End diff -- This is a question to this pre-existing code: We added character and byte semantics, like Oracle has it, to UTF-8 character columns. Do we need something similar here? Maybe as simple as adding the keyword "BYTES" when we deal with UTF-8 character columns with a length specified in bytes? > Incremental UPDATE STATS fails on long varchar values > - > > Key: TRAFODION-2576 > URL: https://issues.apache.org/jira/browse/TRAFODION-2576 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Environment: All >Reporter: David Wayne Birdsall >Assignee: David Wayne Birdsall > > The following test script demonstrates the problem: > -- This script creates a table with varchar values longer than 256 characters. > -- It tnen demonstrates issues with incremental stats on such values. > ?section setup > drop table if exists testbigcol; > create table testbigcol > ( a int not null, > b varchar(500), > primary key (a) ) > salt using 4 partitions; > insert into testbigcol values (1,'axq'),(2,'bxw'); > insert into testbigcol select x.a + 2 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 6 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 42 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > -- 0 1 2 3 4 > 5 > -- > 012345678901234567890123456789012345678901234567890123 > update testbigcol set b = b || ' and now for something completely different > 0123456789' > where mod(a,2) = 1; > update testbigcol set b = b || ' that was fun, and now for something > completely different 0123456789' > where mod(a,3) = 1; > update testbigcol set b = b || ' and then there was a great hue and cry and > now for something completely different 0123456789' > where mod(a,5) = 1; > update testbigcol set b = b || ' and we can do all sorts of things to make > these strings longer, and now for something completely different 0123456789' > where mod(a,7) = 1; > ?section testit > update statistics for table testbigcol on every column sample random 100 > percent persistent; > showstats for table testbigcol on existing columns detail; > update testbigcol set b = 'c' || b where a > 1700; > update statistics for table testbigcol on existing columns incremental where > a > 1700; > showstats for table testbigcol on existing columns detail; > When run,
[jira] [Commented] (TRAFODION-2576) Incremental UPDATE STATS fails on long varchar values
[ https://issues.apache.org/jira/browse/TRAFODION-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957383#comment-15957383 ] ASF GitHub Bot commented on TRAFODION-2576: --- GitHub user DaveBirdsall opened a pull request: https://github.com/apache/incubator-trafodion/pull/1046 [TRAFODION-2576] Fix long varchar bug in incremental UPDATE STATISTICS You can merge this pull request into a Git repository by running: $ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2376 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-trafodion/pull/1046.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1046 commit ebf98de2ba8ffd361773df3e42b8813b98ce6691 Author: Dave BirdsallDate: 2017-04-05T18:23:38Z [TRAFODION-2576] Fix long varchar bug in incremental UPDATE STATISTICS > Incremental UPDATE STATS fails on long varchar values > - > > Key: TRAFODION-2576 > URL: https://issues.apache.org/jira/browse/TRAFODION-2576 > Project: Apache Trafodion > Issue Type: Bug > Components: sql-cmp > Environment: All >Reporter: David Wayne Birdsall >Assignee: David Wayne Birdsall > > The following test script demonstrates the problem: > -- This script creates a table with varchar values longer than 256 characters. > -- It tnen demonstrates issues with incremental stats on such values. > ?section setup > drop table if exists testbigcol; > create table testbigcol > ( a int not null, > b varchar(500), > primary key (a) ) > salt using 4 partitions; > insert into testbigcol values (1,'axq'),(2,'bxw'); > insert into testbigcol select x.a + 2 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 6 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > insert into testbigcol select x.a + 42 * y.a, x.b || y.b from testbigcol x > cross join testbigcol y; > -- 0 1 2 3 4 > 5 > -- > 012345678901234567890123456789012345678901234567890123 > update testbigcol set b = b || ' and now for something completely different > 0123456789' > where mod(a,2) = 1; > update testbigcol set b = b || ' that was fun, and now for something > completely different 0123456789' > where mod(a,3) = 1; > update testbigcol set b = b || ' and then there was a great hue and cry and > now for something completely different 0123456789' > where mod(a,5) = 1; > update testbigcol set b = b || ' and we can do all sorts of things to make > these strings longer, and now for something completely different 0123456789' > where mod(a,7) = 1; > ?section testit > update statistics for table testbigcol on every column sample random 100 > percent persistent; > showstats for table testbigcol on existing columns detail; > update testbigcol set b = 'c' || b where a > 1700; > update statistics for table testbigcol on existing columns incremental where > a > 1700; > showstats for table testbigcol on existing columns detail; > When run, the last UPDATE STATISTICS command fails as follows: > >>update statistics for table testbigcol on existing columns incremental > >>where a > 1700; > *** ERROR[9200] UPDATE STATISTICS for table TRAFODION.SEABASE.TESTBIGCOL > encountered an error (8402) from statement IUS data set I creation. > *** ERROR[8402] A string overflow occurred during the evaluation of a > character expression. Conversion of Source Type:VARCHAR(REC_BYTE_V_ASCII,290 > BYTES,ISO88591) Source Value:caxqbxwbxwaxqaxqbxwbxwbxw and now for something > completely different 0123456789 and then there was a great hue and cry and > now for something completely different 0123456789 and we can do all sorts of > to Target Type:VARCHAR(REC_BYTE_V_ASCII,256 BYTES,ISO88591). > --- SQL operation failed with errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)