[sqlalchemy] Re: Differentiate ANSIIdentifierPreparer.format_column select clause from where clause??

Michael Bayer Wed, 21 Feb 2007 17:26:13 -0800

I put up ticket #487 for this.  theres a whole batch of sql-related  
tickets im going to try to line up for a marathon session at some  
future date in the next few weeks.


On Feb 21, 2007, at 10:25 AM, Troy wrote:

>
> Mike,
>
> Based on your excellent feedback and trying a couple different
> approaches I came up with the following.  I have a diff file, but I'm
> new to contributing to  open source so I'm not sure the best method to
> get this to you (so I'll copy paste below).  The overview:
>
> * added visit_case_insensitive and overrode in PGCompiler to use
> "lower"
> * sub-classed ClauseList with _OrderByClause and _GroupByClause.
>
> This was done to add some context as to where in the statement the
> clauses belong, because we do not want to mess with the case of the
> results being returned, only when the character data is compared at
> the server.  I need to do some more work to peg HAVING, but this works
> well otherwise.
>
> * added _CaseInsensitiveCompare like you mentioned.
> * modified _compare_self instead of __eq__, and added case_compare
> param to the String type
>
> I think this is better then added case_compare at the column level,
> because this only affects character data.  Dates, Int's, etc do not
> have the same collation issues and the issues with Date that do exist
> are already addressed.
>
> Let me know if you need me to run in a different direction with this,
> or what I need to do differently to get this considered to be included
> in sqlalchemy.
>
> Troy
>
>
> ====== Diff File ======
>
> Index: sqlalchemy/sql.py
> ===================================================================
> --- sqlalchemy/sql.py   (revision 2346)
> +++ sqlalchemy/sql.py   (working copy)
> @@ -351,6 +351,7 @@
>      def visit_clauselist(self, list):pass
>      def visit_calculatedclause(self, calcclause):pass
>      def visit_function(self, func):pass
> +    def visit_case_insensitive(self, func):pass
>      def visit_cast(self, cast):pass
>      def visit_label(self, label):pass
>      def visit_typeclause(self, typeclause):pass
> @@ -947,6 +948,17 @@
>          else:
>              return False
>
> +class _OrderByClause(ClauseList):
> +    def append(self, clause):
> +        if _is_literal(clause):
> +            clause = _TextClause(str(clause))
> +        elif isinstance(clause,ColumnElement) and not
> getattr(clause.type,'compare_case',True):
> +            clause = _CaseInsensitiveCompare(clause)
> +        self.clauses.append(clause)
> +
> +class _GroupByClause(_OrderByClause):
> +    pass
> +
>  class _CompoundClause(ClauseList):
>      """represents a list of clauses joined by an operator, such as
> AND or OR.
>      extends ClauseList to add the operator as well as a from_objects
> accessor to
> @@ -1037,6 +1049,16 @@
>              c.accept_visitor(visitor)
>          visitor.visit_function(self)
>
> +class _CaseInsensitiveCompare(ColumnElement):
> +    def __init__(self, clause):
> +        self.clause = clause
> +        self.type = clause.type
> +    def accept_visitor(self, visitor):
> +        self.clause.accept_visitor(visitor)
> +        visitor.visit_case_insensitive(self)
> +    def _get_from_objects(self):
> +        return self.clause._get_from_objects()
> +
>  class _Cast(ColumnElement):
>      def __init__(self, clause, totype, **kwargs):
>          if not hasattr(clause, 'label'):
> @@ -1364,18 +1386,18 @@
>          return True
>      def order_by(self, *clauses):
>          if len(clauses) == 1 and clauses[0] is None:
> -            self.order_by_clause = ClauseList()
> +            self.order_by_clause = _OrderByClause()
>          elif getattr(self, 'order_by_clause', None):
> -            self.order_by_clause =
> ClauseList(*(list(self.order_by_clause.clauses) + list(clauses)))
> +            self.order_by_clause =
> _OrderByClause(*(list(self.order_by_clause.clauses) + list(clauses)))
>          else:
> -            self.order_by_clause = ClauseList(*clauses)
> +            self.order_by_clause = _OrderByClause(*clauses)
>      def group_by(self, *clauses):
>          if len(clauses) == 1 and clauses[0] is None:
> -            self.group_by_clause = ClauseList()
> +            self.group_by_clause = _GroupByClause()
>          elif getattr(self, 'group_by_clause', None):
> -            self.group_by_clause = ClauseList(*(list(clauses)
> +list(self.group_by_clause.clauses)))
> +            self.group_by_clause = _GroupByClause(*(list(clauses)
> +list(self.group_by_clause.clauses)))
>          else:
> -            self.group_by_clause = ClauseList(*clauses)
> +            self.group_by_clause = _GroupByClause(*clauses)
>      def select(self, whereclauses = None, **params):
>          return select([self], whereclauses, **params)
>      def _get_from_objects(self):
> Index: sqlalchemy/databases/postgres.py
> ===================================================================
> --- sqlalchemy/databases/postgres.py    (revision 2346)
> +++ sqlalchemy/databases/postgres.py    (working copy)
> @@ -463,7 +463,7 @@
>
> table.append_constraint(ForeignKeyConstraint(constrained_columns,
> refspec, conname))
>
>  class PGCompiler(ansisql.ANSICompiler):
> -
> +
>      def visit_insert_column(self, column, parameters):
>          # all column primary key inserts must be explicitly present
>          if column.primary_key:
> @@ -502,7 +502,11 @@
>          if isinstance(binary.type, sqltypes.String) and
> binary.operator == '+':
>              return '||'
>          else:
> -            return ansisql.ANSICompiler.binary_operator_string(self,
> binary)
> +            return ansisql.ANSICompiler.binary_operator_string(self,
> binary)
> +
> +    def visit_case_insensitive(self, object):
> +        self.strings[object] = "lower(%s)" %
> self.strings[object.clause]
> +
>
>  class PGSchemaGenerator(ansisql.ANSISchemaGenerator):
>
> Index: sqlalchemy/schema.py
> ===================================================================
> --- sqlalchemy/schema.py        (revision 2346)
> +++ sqlalchemy/schema.py        (working copy)
> @@ -497,6 +497,12 @@
>              for constraint in self.constraints:
>                  constraint.accept_schema_visitor(visitor,
> traverse=True)
>          visitor.visit_column(self)
> +
> +    def _compare_self(self):
> +        if isinstance(self.type,types.String) and not
> self.type.compare_case:
> +            return sql._CaseInsensitiveCompare(self)
> +        else:
> +            return self
>
>
>  class ForeignKey(SchemaItem):
> Index: sqlalchemy/types.py
> ===================================================================
> --- sqlalchemy/types.py (revision 2346)
> +++ sqlalchemy/types.py (working copy)
> @@ -169,11 +169,18 @@
>              return super(String, cls).__new__(cls, *args, **kwargs)
>          else:
>              return super(String, TEXT).__new__(TEXT, *args, **kwargs)
> -    def __init__(self, length = None):
> +    def __init__(self, length = None, compare_case = False):
> +        """Initializes new string instance
> +
> +        compare_case - determines if case affects comparisons,
> sorting, etc
> +        """
>          self.length = length
> +        self.compare_case = compare_case
>      def adapt(self, impltype):
>          return impltype(length=self.length)
>      def convert_bind_param(self, value, dialect):
> +        if not self.compare_case and hasattr(value,'lower'):
> +            value = value.lower()
>          if not dialect.convert_unicode or value is None or not
> isinstance(value, unicode):
>              return value
>          else:
> @@ -200,7 +207,7 @@
>               return value.decode(dialect.encoding)
>           else:
>               return value
> -
> +
>  class Integer(TypeEngine):
>      """integer datatype"""
>      def get_dbapi_type(self, dbapi):
> Index: sqlalchemy/ansisql.py
> ===================================================================
> --- sqlalchemy/ansisql.py       (revision 2346)
> +++ sqlalchemy/ansisql.py       (working copy)
> @@ -225,6 +225,10 @@
>              else:
>                  self.strings[column] =
> self.preparer.format_column_with_table(column)
>
> +    def visit_case_insensitive(self, object):
> +        #override this in databases that default to case insesitive
> collation
> +        self.strings[object] =
> self.strings[object.clause]
> +
>      def visit_fromclause(self, fromclause):
>          self.froms[fromclause] = fromclause.name
>
> @@ -257,7 +261,7 @@
>              self.strings[compound] = "(" + s + ")"
>          else:
>              self.strings[compound] = s
> -
> +
>      def visit_clauselist(self, list):
>          if list.parens:
>              self.strings[list] = "(" + string.join([s for s in
> [self.get_str(c) for c in list.clauses] if s is not None], ', ') + ")"
>
>
>
>
>
>
>
>
>
> On Feb 19, 12:17 pm, "Michael Bayer" <[EMAIL PROTECTED]> wrote:
>> On Feb 19, 2:32 pm, "Troy" <[EMAIL PROTECTED]> wrote:
>>
>>> Even though MySQL allows the setting of COLLATE, it does not support
>>> functional indexes, so if your code explicitly calls lower you
>>> technically now have code that will work for both MySQL and  
>>> Postgres,
>>> but MySQL is going to take a big performance hit and perform table
>>> scans regardless of any created indexes.  Realistically, I can't see
>>> anyone setting MySQL collate rules to case-sensitive without the
>>> support of functional indexes, which MySQL does not have.
>>
>> right...so some explicitness is required, somewhere (since SA cant
>> just put lower() across the board).
>>
>>
>>
>>> In simple terms, if the Postgres dialect supported
>>> compare_insensitive=True|False|[upper|lower]?
>>
>> on a Column-by-Column or even per-operation context is probably more
>> useful.
>>
>>> sqlalchemy easily support Postgres and MySQL with the same code  
>>> in out-
>>> of-the-box configurations.  But even if it didn't, if there was a  
>>> way
>>> to override the default postgres dialect I'd be a happy camper.
>>> Infact, that is exactly what I have done.  I added a dialect called
>>> "lowergres", but I'm stuck because I can not seem to find the
>>> appropriate hook to alter the column to a func.lower when the column
>>> is not part of the selected column list.
>>
>> format_column looks to me
>>
>>> like the right place to do it without converting a column to a
>>> function, but the column object in that function has no context  
>>> as to
>>> where it lies in the sql statement.
>>
>> for this approach, id advise setting "state" on the compiler when you
>> are processing the column clause of a SELECT (or dealing with insert/
>> update column lists too), vs. when you are processing WHERE, ORDER  
>> BY,
>> etc.  that flag can be used to determine where youre traversing.  the
>> compiler knows the context since its the one generating the  
>> statement.
>>
>> a more accurate way to do is to wrap the Column itself, as it goes
>> into a comparison operation or ORDER BY, in a new construct such as
>> "CaseInsensitive"...heres some pseudo-ish code based on the __eq__()
>> method you see in sqlalchemy/sql.py _CompareMixin (assume "self" is a
>> Column):
>>
>> def __eq__(self, other):
>>     return _BooleanExpression(CaseInsensitive(self), other, '==')
>>
>> Better yet its configurable on Column:
>>
>> def __eq__(self, other):
>>     if self.case_insensitive_compare:
>>         return _BooleanExpression(CaseInsensitive(self), other, '==')
>>     else:
>>         return _BooleanExpression(self, other, '==')
>>
>> CaseInsensitive looks a lot like _Function and is just:
>>
>> class CaseInsensitive(_CalculatedClause):
>>     def __init__(self, target):
>>         self.target = target
>>     def accept_visitor(self, visitor):
>>         self.target.accept_visitor(visitor)
>>         visitor.visit_case_insensitive(self)
>>
>> ansicompiler provides the string representation of the underlying
>> target with no modification:
>>
>> def visit_case_insensitive(self, object):
>>     self.strings[object] = self.strings[object.target]
>>
>> postgres compiler with case_insensitive provides it as:
>>
>> def visit_case_insensitive(self, object):
>>     self.strings[object] = "lower(%s)" % self.strings[object.target]
>>
>> other dialects can have whatever flags to turn on/off the "lower()"
>> wrapping as well.
>>
>> what we're really talking about here is a func.lower() that has
>> special meaning to the compiler, i.e. that it should be conditionally
>> applied based on dialect flags.  i think the flag on Column to have
>> the wrapper applied might be pretty slick.
>>
>>
>>
>>> I'm curious as to others experiences with writing an app that  
>>> supports
>>> both Postgres and MySQL with sqlalchemy, because if someone else is
>>> doing this then I must be missing something, or maybe not.  At  
>>> first,
>>> our unit tests all passed, then when we added real world data with
>>> mixed case, tests started to fail on everything doing sorts and
>>> where's on character data.
>>
>> yeah i dont think anyone has gotten too far with this issue, also ive
>> no idea what the giant universe of Hibernate users do either (i think
>> there just arent db-platform-neutral J2EE apps).
>>
>>
>>
>>> How about a Pepsi (at PyCon)?
>>
>> sure thing !
>
>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: Differentiate ANSIIdentifierPreparer.format_column select clause from where clause??

Reply via email to