Am 12.09.2016 um 08:22 schrieb raf:
The "[2:2]=" prefix only says that Postgres should use the start and end
index 2 for that index, but that's only for accessing the array internally
in Postgres and does not translate to Python, since Python lists always have
the start index 0.

That's a surpringingly unhelpful point of view. If a Postgres
array has a value in item n then the python equivalent of that
array must have the same value in item n-1. To think otherwise
is madness. It's like saying that the position in an array is
meaningless. It really isn't.

Let's take a step back here.

The problem is that the Postgres array type doesn't directly translate to the Python list type, since Postgres arrays carry additional information, namely a start index different from the default (i.e. 1) and show a different behavior by e.g. returning null for an out of bound index while Python lists will raise an IndexError.

There are different ways to deal with that:

1) Simply ignore the start index and return only the array items.

This is how it's currently done and what I think is the simplest and most straight-forward solution. This is also how psycopg2 and pg8000 do it, and I don't see many people complaining about this. It's not perfect, but it's not "madness" either (please avoid such words in Internet discussions, as it's very easy to take them personally and be offended, and unnecessary flamewars will arise).

2) Compensate the start index by adding so many "None"s at the front of the list.

This is your suggestion. I agree, that's an interesting idea. But I also see some problems:

a) PyGres would become incompatible with other drivers (see above)

b) If you go that route you could arguably also insert a None item in front of arrays with the default start index of 1. With other words, it's not immediately clear whether the default offset of 1 should be compensated with None items as well, or whether only non-standard offsets should be compensated. It's also not immediately clear whether None values at the start of a Python list should be automatically translated into a start index when storing a Python list in Postgres, or whether they should be kept as null values.

c) If you retrieve an array with a large start index, the list in Python takes much memory (contrary to Postgres). If you re-insert that list into Postgres again, the database will grow larger as well. This might particularly become a problem with multi-dimensional arrays. With other words, round-trips to the database might not only change, but even inflate data which would not happen with 1).

d) While method 1) destroys info about the start index, method 2) destroys info about the length of the array. Depending on the type of data, this info might be just as relevant as the info about the start index.

Look, even postgres agrees with me:

  select cast('[2:2]={1}' as integer[]) = cast('{1}' as integer[])
   ?column?
  ----------
   f
  (1 row)

See, they're are not the same thing. :-)

Yes, but

select cast('[2:2]={1}' as integer[])
  = cast('{NULL,1}' as integer[])

returns false as well :-) They are also not the same thing, but would be returned as the same thing by your method.

Another argument against your interpretation of [2:2]={1} as [None, 1]: You would expect that [1] concatenated with [None, 1] = [1, None, 1], however, Postgres concatenates them to [1, 1], please try it:

select cast('[1:1]={1}' as integer[])
   || cast('[2:2]={1}' as integer[])

You say above that the index notation doesn't translate to Python.
I'm ssaying that it absolutelu has to translate to Python. If not,
it's indistinguishable from data corruption (just a lot more consistent).

I would not say that. You just need to be aware of the fact that the start index gets lost, and compensate it manually in Python, e.g. by querying array_lower(x) in addition to x if you know that your array x can have different start indices.

I can't magically change the selection sql to identify these arrays
and insert nulls into them on the way out. Or at least I can't anymore.
With pygresql-4 I would receive the string version and parse it into
a correct python list. With pygresql-5 I no longer have that option.

You're right. While you can disable the conversion in pg, there is currently no official way to do it in pgdb. This is an oversight.

However, you can use the following monkeypatch:

pgdb.Typecasts.create_array_cast = lambda self, basecast: str

So, for the time being, you can use that or compensate manually in Python by querying array_lower().

Even if you completely disagree with me, please accept that this
is incredibly important to me. My database is filled with arrays
and many are like these. If you don't add some way for me to
restore the old behaviour or fix the new behaviour, I will not
be able to upgrade to pygresql-5 ever.

I do not completely disagree, but I hope you understand my objections.
I wouldn't want to make your method the default for the reasons listed above, but maybe we can make it optional (it's a bit difficult because the array syntax parse is written in C for performance reasons), and I will certainly add an official way to completely switch off array conversion in pgdb, or use a custom array parser.

I will also mention this issue with the start index in the docs.

-- Chris
_______________________________________________
PyGreSQL mailing list
[email protected]
https://mail.vex.net/mailman/listinfo.cgi/pygresql

Reply via email to