Re: [sqlite] Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

2016-12-12 Thread Scott Hess
On Mon, Dec 12, 2016 at 9:30 PM, Bradford Larsen  wrote:
> An alternative possibility would be to revert to the pre-3.11 tokenizer on
> EBCDIC systems.  If I recall, the old tokenizer used a big switch statement
> with character literals instead of the 'aiClass' table.  I believe this
> would avoid the EBCDIC tokenizing troubles, at the expense of lower
> performance on those systems, and with the maintenance cost of keeping 2
> code paths around.

Or the build pass could compile a switch-based generator.  It would
make cross-compiling more complicated.

Perhaps easier would be to have a set of unrolled table initializers
which used the raw characters as indices, called as part of
sqlite3_init().

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

2016-12-12 Thread Bradford Larsen
>
> On Sun, Dec 11, 2016 at 20:42 Richard Hipp  wrote:
>
> SQLite only uses the "[" character as a compatibility quoting
>
> mechanism for SQL Server.  Maybe the solution is for [...] quoting to
>
> simply not work on EBCDIC systems?
>
>
>
> --
>
> D. Richard Hipp
>
> d...@sqlite.org
>
> This would be one possibility—you then might be able to simultaneously
support several EBCDIC code pages with a single 'aiClass' definition (cp37
and cp1047 for example).

(For what it's worth, I'm sure my application doesn't use this
compatibility quoting mechanism on the EBCDIC system.)

However, maybe it would be better to pick a specific EBCDIC code page to
provide an 'aiClass' definition for in the SQLite sources, and make a note
about this in the docs—perhaps something like this to start:

"SQLite is designed to compile and run on EBCDIC-based systems that use
code page $CHOSEN_CODEPAGE.  If you wish to build SQLite for a system that
uses a different EBCDIC codepage, you will need to modify the definition of
the 'aiClass' table used by the SQL tokenizer."

An alternative possibility would be to revert to the pre-3.11 tokenizer on
EBCDIC systems.  If I recall, the old tokenizer used a big switch statement
with character literals instead of the 'aiClass' table.  I believe this
would avoid the EBCDIC tokenizing troubles, at the expense of lower
performance on those systems, and with the maintenance cost of keeping 2
code paths around.

Brad Larsen
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

2016-12-11 Thread Richard Hipp
SQLite only uses the "[" character as a compatibility quoting
mechanism for SQL Server.  Maybe the solution is for [...] quoting to
simply not work on EBCDIC systems?

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

2016-12-11 Thread Guy Harris
On Dec 11, 2016, at 4:57 PM, Richard Hipp  wrote:

> I agree with most of your changes.  But I wonder about moving the
> QUOTE2 (the '[' character) value from code 0xba over to 0xad.
> According to EBCDIC chart at https://en.wikipedia.org/wiki/EBCDIC the
> '[' character should be at 0xba.  Is wikipedia wrong?

If so, then some obscure company called the International Business Machines 
Corporation is also wrong:

http://publibfp.dhe.ibm.com/epubs/pdf/dz9zr010.pdf

The EBCDIC/ISO-8 chart in appendix I of version 10 of the z/Architecture 
Principles of Operation has [ at 0xba and ] at 0xbb for EBCDIC; it has an 
accented capital Y at 0xad.

However, it's apparently a bit more complicated; different EBCDIC code pages 
apparently give square brackets different code points:

http://www-01.ibm.com/support/docview.wss?uid=swg21515307

Apparently EBCDIC code page IBM 1047 has [ as 0xad and EBCDIC code page 037 has 
[ as 0xba.  Here's 037:

https://en.wikipedia.org/wiki/EBCDIC_037


ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00037.pdf

and here's 1047:

https://en.wikipedia.org/wiki/EBCDIC_1047


ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP01047.pdf

The Wikipedia page on EBCDIC code pages:

https://en.wikipedia.org/wiki/EBCDIC_code_pages

says of 1047 "Open Systems (MVS C compiler)", so the issue may be that Brad 
Larsen is using that compiler and it's insisting that '[' is 0xad.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

2016-12-11 Thread Bradford Larsen
> On Dec 11, 2016, at 7:57 PM, Richard Hipp  wrote:
> 
> On 12/11/16, Bradford Larsen  wrote:
> 
>> #endif
>> #ifdef SQLITE_EBCDIC
>> /* x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
>> /* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
>> /* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>> /* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>> /* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>> /* 4x */7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
>> /* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
>> /* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  6,
>> /* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
>> /* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* Ax */   27, 25,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27,  9, 27, 27,
>> /* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
>> /* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
>> /* Fx */3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
>> #endif
>> };
>> 
>> These changes fixed the SQL tokenizer problems I was seeing my EBCDIC-based
>> system, and resulted in a functioning SQLite there.
>> 
>> Please let me know if more is needed to fix this bug.
> 
> Thanks for the suggested fix.  As you probably infer, we do not have
> access to an EBCDIC system for testing.
> 
> I agree with most of your changes.  But I wonder about moving the
> QUOTE2 (the '[' character) value from code 0xba over to 0xad.
> According to EBCDIC chart at https://en.wikipedia.org/wiki/EBCDIC the
> '[' character should be at 0xba.  Is wikipedia wrong?  Is there
> something else about the '[' character in EBCDIC that we should know
> about?


There are many flavors of EBCDIC; the details depend on which code page is 
used.  The mainframes I’ve used have all been configured for code page 1047 
(the “Latin 1/Open System” code page).

The table in the Wikipedia page for EBCDIC 
(https://en.wikipedia.org/wiki/EBCDIC ) 
is for code page 37.  You’re right that the ‘[‘ character has value 0xba in 
code page 37, but in code page 1047, the ‘[‘ character has value 0xad 
(https://en.wikipedia.org/wiki/EBCDIC_1047 
).  See also 
https://en.wikipedia.org/wiki/Code_page#EBCDIC-based_code_pages 
.

I’m afraid the entire EBCDIC situation is complicated, and there isn’t a single 
definition of EBCDIC like there is ASCII — in different EBCDIC code pages, 
characters will have different values.  Code page 1047 may be the most common 
native code page these days — it’s the code page used by all the mainframes 
I’ve touched.  (It’s the default code page for C source code as understood by 
the IBM XL C/C++ compiler, though this does not preclude *running* with a 
different code page than was used for building.)

Either way, it looks like (a) the code page used by SQLite on EBCDIC systems is 
ambiguous, and (b) whatever code page you pick, there seem to be typos in the 
relevant tables in the SQLite source.  The result is a broken SQL tokenizer on 
EBCDIC-based systems.

I think a particular code page should be chosen, and the SQLite sources & docs 
should be clear about which code page is used.  Maybe there's someone else out 
there who has expertise with EBCDIC who will comment.

Best,
Brad Larsen
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

2016-12-11 Thread Richard Hipp
On 12/11/16, Bradford Larsen  wrote:

> #endif
> #ifdef SQLITE_EBCDIC
> /* x0  x1  x2  x3  x4  x5  x6  x7  x8  x9  xa  xb  xc  xd  xe  xf */
> /* 0x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27,  7,  7, 27, 27,
> /* 1x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* 2x */   27, 27, 27, 27, 27,  7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* 3x */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* 4x */7, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 26, 12, 17, 20, 10,
> /* 5x */   24, 27, 27, 27, 27, 27, 27, 27, 27, 27, 15,  4, 21, 18, 19, 27,
> /* 6x */   11, 16, 27, 27, 27, 27, 27, 27, 27, 27, 27, 23, 22,  1, 13,  6,
> /* 7x */   27, 27, 27, 27, 27, 27, 27, 27, 27,  8,  5,  5,  5,  8, 14,  8,
> /* 8x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
> /* 9x */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
> /* Ax */   27, 25,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27,  9, 27, 27,
> /* Bx */   27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27,
> /* Cx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
> /* Dx */   27,  1,  1,  1,  1,  1,  1,  1,  1,  1, 27, 27, 27, 27, 27, 27,
> /* Ex */   27, 27,  1,  1,  1,  1,  1,  0,  1,  1, 27, 27, 27, 27, 27, 27,
> /* Fx */3,  3,  3,  3,  3,  3,  3,  3,  3,  3, 27, 27, 27, 27, 27, 27,
> #endif
> };
>
> These changes fixed the SQL tokenizer problems I was seeing my EBCDIC-based
> system, and resulted in a functioning SQLite there.
>
> Please let me know if more is needed to fix this bug.

Thanks for the suggested fix.  As you probably infer, we do not have
access to an EBCDIC system for testing.

I agree with most of your changes.  But I wonder about moving the
QUOTE2 (the '[' character) value from code 0xba over to 0xad.
According to EBCDIC chart at https://en.wikipedia.org/wiki/EBCDIC the
'[' character should be at 0xba.  Is wikipedia wrong?  Is there
something else about the '[' character in EBCDIC that we should know
about?

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users