Re: [PyGreSQL] sql array notation parsing bug in pygresql-5+

raf Sun, 11 Sep 2016 23:23:17 -0700

Christoph Zwerschke wrote:

Sorry for the delay in responding.


> Am 23.08.2016 um 06:57 schrieb raf:
> >Here's a little program to demonstrate:
> >
> >  #!/usr/bin/env python
> >  import pgdb
> >  connection = pgdb.connect(host='XXX', database='XXX', user='XXX', 
> > password='XXX')
> >  cursor = connection.cursor()
> >  cursor.execute("select cast('[2:2]={1}' as integer[])")
> >  rows = cursor.fetchall()
> >  print('%r' % rows[0][0])
> >  # pygresql-4.2.2 outputs '[2:2]={1}' (which then needs to be parsed)
> >  # pygresql-5.0.1 outputs [1]         (which should be [None, 1])
> 
> Why do you think this should be [None, 1]?

The postgres array '[2:2]={1}' has the value 1 at index 1.
Therefore, the corresponding python list must have the value 1
at index 0. Since the first index in the postgres array is 2
(rather than 1), the value in the corresponding python list at
index 0 should be None. This corresponds to the fact that if you
select the (non-existant) item at index 1 from that array, you
will receive null/None.

> The expression "[2:2]={1}" actually describes an array with ONE element,
> "1". This translates into a list with ONE element in Python.

It may well be an array with one element but that element is at index 2
(where indexes start with 1 rather than 0). That's what the "[2:2]="
means at the front. That is what it has always meant. You can't just
disregard that vitally important piece of information. It is vital
to the structure of the data.

> The "[2:2]=" prefix only says that Postgres should use the start and end
> index 2 for that index, but that's only for accessing the array internally
> in Postgres and does not translate to Python, since Python lists always have
> the start index 0.

That's a surpringingly unhelpful point of view. If a Postgres
array has a value in item n then the python equivalent of that
array must have the same value in item n-1. To think otherwise
is madness. It's like saying that the position in an array is
meaningless. It really isn't.

Look, even postgres agrees with me:

  select cast('[2:2]={1}' as integer[]) = cast('{1}' as integer[])
   ?column? 
  ----------
   f
  (1 row)

See, they're are not the same thing. :-)

> Note that using this prefix you can only change the
> index, not magically add NULL elements.
> 
>   (cast('[2:2]={1}' as integer[]))[2]  --> 1
>   array_length(cast('[2:2]={1}' as integer[]), 1)  --> 1

Yes, but if you have a table column caled "a" whose type is
integer[] and you set the 2nd item to 1, then the value in that
column will be '[2:2]={1}' but you can still select a[1] and get
null. So, postgres effectively "magically" adds the null. It just
doesn't bother storing it. In python we have to store it because
indexes all start at zero.

You say above that the index notation doesn't translate to Python.
I'm ssaying that it absolutelu has to translate to Python. If not,
it's indistinguishable from data corruption (just a lot more consistent).

> As you see, the actual array has still only one item even in Postgres,
> regardless of the fact that the start index is set to 2. So it should be
> also translated to a Python list with only one item.

You can't disregard the fact that the start index is set to 2.
That's the whole point I'm trying to make.

It should be translated into a python list whose item[n-1] equals the
value whose item[n] in postgres for all n from 1 to array_upper() of
the array.

> However, if you really want a Null element as part of the list, you can do
> it as follows:
> 
>   cursor.execute("select cast('{NULL,1}' as integer[])")
>   rows = cursor.fetchall()
>   print('%r' % rows[0][0])
> 
> This should in fact output [None, 1].

No I can't! The data already exists in postgres. I'm just selecting it.

I can't magically change the selection sql to identify these arrays
and insert nulls into them on the way out. Or at least I can't anymore.
With pygresql-4 I would receive the string version and parse it into
a correct python list. With pygresql-5 I no longer have that option.

Even if you completely disagree with me, please accept that this
is incredibly important to me. My database is filled with arrays
and many are like these. If you don't add some way for me to
restore the old behaviour or fix the new behaviour, I will not
be able to upgrade to pygresql-5 ever.

Please, either add what I consider to be correct parsing for the
postgres's index notation, and provide some way to select that
interpretation (although it really should be the default, I
don't care as long as it's available somehow) or just provide an
option to leave the postgres arrays in their string form so that
I can continue to parse them myself.

Please don't force me to stay on pygresql-4 forever.

cheers,
raf

> -- Chris (back from vacation)
> _______________________________________________
> PyGreSQL mailing list
> [email protected]
> https://mail.vex.net/mailman/listinfo.cgi/pygresql
_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo.cgi/pygresql

Re: [PyGreSQL] sql array notation parsing bug in pygresql-5+

Reply via email to