Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

Mike Bayer Fri, 29 Mar 2019 07:10:29 -0700

Also note pymssql is not well maintained right now due to lack of
funding, please confirm you reproduce your performance concerns using
PyODBC with Microsofts ODBC drivers ?  That should be considered to be
the canonically supported driver right now, works on all platforms
very well now.


On Fri, Mar 29, 2019 at 10:07 AM Mike Bayer <mike...@zzzcomputing.com> wrote:
>
> OK so I saw that the "N" prefix is not generated with your test case
> either, so I re-read your email.  Can you clarify what you mean by
> "always encoded as NVARCHAR"?   are you referring to the simple fact
> that a Python string object is passed to the driver, and that the
> driver is behind the scenes applying the "N" in any case or is
> otherwise binding it as unicode in such a way that performance is
> impacted?   SQLAlchemy for many years passed bytestrings to drivers
> like pyodbc because they would simply crash if you passed them a
> unicode object, but once they supported it, SQLAlchemy was eager to
> get out of the business of doing this encoding.    In 1.3 we've just
> deprecated all the flags that allow it to do this
> (convert_unicode=True).    Using that flag would be your quickest way
> to get it back for now but we'd have to develop an all new behavior
> for 1.4 if we are to start encoding these binds again, however current
> behavior has been this way for many years and this is the first it's
> being reported in this way.  I would want to look into driver
> configuration options for this as well.
>
>
>
>
> On Fri, Mar 29, 2019 at 9:56 AM Mike Bayer <mike...@zzzcomputing.com> wrote:
> >
> > On Fri, Mar 29, 2019 at 6:20 AM Ian Wagner <ianthetec...@gmail.com> wrote:
> > >
> > > Hello all,
> > >
> > > I'm trying to get to the bottom of an issue in which Python 3 (unicode by 
> > > definition) strings are always encoded as NVARCHAR for at least two 
> > > backends (pymssql and pyodbc). Using bytstrings as comparison arguments 
> > > (for example Table.column == value.encode('utf-8')) sends a regular 
> > > string literal as expected, but regular strings are encoded as NVARCHAR 
> > > literals.
> > >
> > > This behavior is fairly logical at the underlying driver (pymssql or 
> > > pyodbc) level, which is why I'm posting here. I believe the the use of a 
> > > String column (as opposed to a Unicode column) type should not pass an 
> > > NVARCHAR literal. Doing so has disastrous performance implications, as 
> > > SQL Server ends up casting the whole column up. This will wreak havoc 
> > > when regularly dealing with large-ish tables (1.7 million rows or so in 
> > > our case).
> > >
> > > I have previously posted with a LOT more details on StackOverflow 
> > > (https://stackoverflow.com/questions/55098426/strings-used-in-query-always-sent-with-nvarchar-syntax-even-if-the-underlying-c).
> > >  I also have an MCVE over on GitHub 
> > > (https://github.com/ianthetechie/pymssql_sqlalchemy_55098426).
> > >
> > > In my MCVE, I outline a possible approach for fixing this, but it appears 
> > > to have some problems. I'm posting here asking for feedback on what's 
> > > wrong with my approach, and what would be the best way to go about 
> > > getting this fixed.
> >
> > seems like we will need some documentation for this as it is confusing
> > a lot of people.   The issue that introduced this behavior is
> > https://github.com/sqlalchemy/sqlalchemy/issues/4442 and then that
> > same user felt it was happening too often in
> > https://github.com/sqlalchemy/sqlalchemy/issues/4561, however I
> > clarified that the N prefix only generates if there is no other
> > context to determine that this is not a non-unicode context.
> >
> > the NVARCHAR thing should not happen if you are comparing to a
> > non-NVARCHAR column.   it only occurs when there is no other context
> > that SQLAlchemy can determine the correct datatype for the Unicode
> > object being passed.
> >
> > However, the example case you have on github there seems to be using a
> > unicode in a VARCHAR comparison so should not see the N.  If it does,
> > it's a bug.  I will try your test case now.
> >
> > In the future, please go straight to the SQLAlchemy github issue
> > tracker with a succinct test case, as this N thing is obviously still
> > ongoing.
> >
> >
> >
> > >
> > > Thanks!
> > > Ian
> > >
> > > --
> > > SQLAlchemy -
> > > The Python SQL Toolkit and Object Relational Mapper
> > >
> > > http://www.sqlalchemy.org/
> > >
> > > To post example code, please provide an MCVE: Minimal, Complete, and 
> > > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > > description.
> > > ---
> > > You received this message because you are subscribed to the Google Groups 
> > > "sqlalchemy" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to sqlalchemy+unsubscr...@googlegroups.com.
> > > To post to this group, send email to sqlalchemy@googlegroups.com.
> > > Visit this group at https://groups.google.com/group/sqlalchemy.
> > > For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] MSSQL String columns often incorrectly encoded as NVARCHAR

Reply via email to