Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

Ian Wagner Sun, 31 Mar 2019 19:43:52 -0700

Thanks. I’ll pursue those avenues.

Cheers,
Ian


2019년 4월 1일 (월) 11:30, Mike Bayer <[email protected]>님이 작성:

> On Sun, Mar 31, 2019 at 10:12 PM Ian Wagner <[email protected]>
> wrote:
> >
> >
> > My suggestion would be a pymssql dialect-level patch to send bytestrings
> for String columns, and of course continue passing str/unicode for Unicode
> columns. I'm on the mailing list looking for help with why my solution
> doesn't work as intended with enums (see my GitHub repo). For the moment,
> I've actually just implemented this column myself as shown, and replaced
> all String columns with it (rather than "replacing" String using colspecs),
> and banned all devs on our project from using the regular String column
> type until it's fixed.
>
> Please just use a TypeDecorator for all your String columns where this
> issue is apparent:
>
> class UTF8String(TypeDecorator):
>     impl = String
>
>     def process_bind_param(self, value, dialect):
>         if value is not None:
>             value = value.encode(dialect.encoding)
>         return value
>
> next, the issue should be reported to pymssql, where they should
> likely include options to modify this behavior:
>
> https://github.com/pymssql/pymssql
>
> next, I need this to be confirmed as an issue for pyodbc.    As my
> understanding is that Microsoft is funding Pyodbc's development I'd
> like to see what their approach to this issue is.
>
> *if* it is truly the canonical solution that applications that code
> against these DBAPIs *must* send byte arrays in order to avoid
> crushing performance costs, the drivers need to make that clear.
> then we can look into introducing either flags or permanent behavior
> such that we encode all Python unicode objects for a *non unicode*
> String datatype under the MSSQL dialects.
>
>
>
> >
> >> Also note pymssql is not well maintained right now due to lack of
> >> funding
> >
> >
> > Noted. We'll look into switching drivers yet again, but the official
> driver lacked several features the last time we evaluated it (I think
> stored proc output parameters were not supported, and it would require
> quite a few syntax changes in areas where we have to write raw SQL).
> >
> > Cheers,
> > Ian
> >
> > On Friday, March 29, 2019 at 11:10:21 PM UTC+9, Mike Bayer wrote:
> >>
> >> Also note pymssql is not well maintained right now due to lack of
> >> funding, please confirm you reproduce your performance concerns using
> >> PyODBC with Microsofts ODBC drivers ?  That should be considered to be
> >> the canonically supported driver right now, works on all platforms
> >> very well now.
> >>
> >> On Fri, Mar 29, 2019 at 10:07 AM Mike Bayer <[email protected]>
> wrote:
> >> >
> >> > OK so I saw that the "N" prefix is not generated with your test case
> >> > either, so I re-read your email.  Can you clarify what you mean by
> >> > "always encoded as NVARCHAR"?   are you referring to the simple fact
> >> > that a Python string object is passed to the driver, and that the
> >> > driver is behind the scenes applying the "N" in any case or is
> >> > otherwise binding it as unicode in such a way that performance is
> >> > impacted?   SQLAlchemy for many years passed bytestrings to drivers
> >> > like pyodbc because they would simply crash if you passed them a
> >> > unicode object, but once they supported it, SQLAlchemy was eager to
> >> > get out of the business of doing this encoding.    In 1.3 we've just
> >> > deprecated all the flags that allow it to do this
> >> > (convert_unicode=True).    Using that flag would be your quickest way
> >> > to get it back for now but we'd have to develop an all new behavior
> >> > for 1.4 if we are to start encoding these binds again, however current
> >> > behavior has been this way for many years and this is the first it's
> >> > being reported in this way.  I would want to look into driver
> >> > configuration options for this as well.
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Mar 29, 2019 at 9:56 AM Mike Bayer <[email protected]>
> wrote:
> >> > >
> >> > > On Fri, Mar 29, 2019 at 6:20 AM Ian Wagner <[email protected]>
> wrote:
> >> > > >
> >> > > > Hello all,
> >> > > >
> >> > > > I'm trying to get to the bottom of an issue in which Python 3
> (unicode by definition) strings are always encoded as NVARCHAR for at least
> two backends (pymssql and pyodbc). Using bytstrings as comparison arguments
> (for example Table.column == value.encode('utf-8')) sends a regular string
> literal as expected, but regular strings are encoded as NVARCHAR literals.
> >> > > >
> >> > > > This behavior is fairly logical at the underlying driver (pymssql
> or pyodbc) level, which is why I'm posting here. I believe the the use of a
> String column (as opposed to a Unicode column) type should not pass an
> NVARCHAR literal. Doing so has disastrous performance implications, as SQL
> Server ends up casting the whole column up. This will wreak havoc when
> regularly dealing with large-ish tables (1.7 million rows or so in our
> case).
> >> > > >
> >> > > > I have previously posted with a LOT more details on StackOverflow
> (
> https://stackoverflow.com/questions/55098426/strings-used-in-query-always-sent-with-nvarchar-syntax-even-if-the-underlying-c).
> I also have an MCVE over on GitHub (
> https://github.com/ianthetechie/pymssql_sqlalchemy_55098426).
> >> > > >
> >> > > > In my MCVE, I outline a possible approach for fixing this, but it
> appears to have some problems. I'm posting here asking for feedback on
> what's wrong with my approach, and what would be the best way to go about
> getting this fixed.
> >> > >
> >> > > seems like we will need some documentation for this as it is
> confusing
> >> > > a lot of people.   The issue that introduced this behavior is
> >> > > https://github.com/sqlalchemy/sqlalchemy/issues/4442 and then that
> >> > > same user felt it was happening too often in
> >> > > https://github.com/sqlalchemy/sqlalchemy/issues/4561, however I
> >> > > clarified that the N prefix only generates if there is no other
> >> > > context to determine that this is not a non-unicode context.
> >> > >
> >> > > the NVARCHAR thing should not happen if you are comparing to a
> >> > > non-NVARCHAR column.   it only occurs when there is no other context
> >> > > that SQLAlchemy can determine the correct datatype for the Unicode
> >> > > object being passed.
> >> > >
> >> > > However, the example case you have on github there seems to be
> using a
> >> > > unicode in a VARCHAR comparison so should not see the N.  If it
> does,
> >> > > it's a bug.  I will try your test case now.
> >> > >
> >> > > In the future, please go straight to the SQLAlchemy github issue
> >> > > tracker with a succinct test case, as this N thing is obviously
> still
> >> > > ongoing.
> >> > >
> >> > >
> >> > >
> >> > > >
> >> > > > Thanks!
> >> > > > Ian
> >> > > >
> >> > > > --
> >> > > > SQLAlchemy -
> >> > > > The Python SQL Toolkit and Object Relational Mapper
> >> > > >
> >> > > > http://www.sqlalchemy.org/
> >> > > >
> >> > > > To post example code, please provide an MCVE: Minimal, Complete,
> and Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> >> > > > ---
> >> > > > You received this message because you are subscribed to the
> Google Groups "sqlalchemy" group.
> >> > > > To unsubscribe from this group and stop receiving emails from it,
> send an email to [email protected].
> >> > > > To post to this group, send email to [email protected].
> >> > > > Visit this group at https://groups.google.com/group/sqlalchemy.
> >> > > > For more options, visit https://groups.google.com/d/optout.
> >
> > --
> > SQLAlchemy -
> > The Python SQL Toolkit and Object Relational Mapper
> >
> > http://www.sqlalchemy.org/
> >
> > To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> > ---
> > You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> > To post to this group, send email to [email protected].
> > Visit this group at https://groups.google.com/group/sqlalchemy.
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/eCpCRzuZxFs/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>
-- 
Like Sudoku? Check out my free app "Endless Sudoku" on the App Store
<http://itunes.apple.com/app/endless-sudoku/id492773309> or Google Play
<https://play.google.com/store/apps/details?id=com.ianthetechie.endless_sudoku>
!
Games aren't your thing? How about an awesome GPS that you can wear on your
wrist? Check out Pebble Pilot GPS <http://www.pebblegps.com/>. It's legit.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

Reply via email to