Am 15.09.2016 um 04:34 schrieb raf:
>> The question for me is not how frequently arrays are used, but how
>> frequently arrays with non-default start indices are used. Note that the
>> ARRAY constructor in Postgres doesn't even have a way to set a different
>> start index (as far as I know).
>
> Yes, they're usually created by something like:
>
>   update some_table set array_column[index_other_than_1] = value...

Right, if you use an index < 1 or update an empty array, the start index would be changed. I still don't think this is something that is frequently done, but yes, it's possible.

> I've never really thought of them as having a non-default start
> index. I've always just thought of the '[#:#]={}' notation as a
> Postgres-specific "compression" format which needed to be
> "decompressed" when fetched but that seems not to be the case
> (and it doesn't explain negative start indexes).

Not really compression, since all empty values are returned as NULL. Actually the end index is redundant in this notation.

> I suspect that the speed difference is not because of the array
> parser. There must some other reason so I've attached the test
> program I used rather than one that just tests the parser in
> isolation. The array parser in the attachment is slow and only
> handles 1-dimensional arrays. I think it's safe to say that the
> C parser would be much faster.

I think so. But it may point to a performance degradation elsewhere, so I'd like to check this. Did you forget to attach the test?

> I'm not sure. I'd be happy for it to return None for indexes
> between 0 and the "real" start but I wouldn't want it to return
> None for indexes past the end even though that would mimic
> Postgres behaviour. I'd rather it behaved like a Python list
> with enough None values inserted at the beginning to make the
> indexes match (although being off-by-one of course). In other
> words, I'd want len(a) in Python to return the same value as
> array_upper(a, 1) in Postgres. But it sounds like that's too
> tacky which is fair enough. Just because it's what I want
> doesn't mean that's what anyone else would want.

Exactly. As you see, there are many different ways to implement this and if we do it one way, there is always somebody who will complain. Therefore it is best to implement only the straightforward, obvious way, but allow for customization.

Again, the problem here is that Postgres arrays are different beasts than lists in Python, much more similar to Arrays in JavaScript.

> If inserting None values into an ordinary Python list is not an
> option, my next thought was maybe the client can request
> non-optional behaviour somehow that means that, when fetching
> arrays, if the start index is 1, a Python list is returned (as
> is the case now) but if the start index is not 1, then a 2-tuple
> is returned instead containing the Postgres start index as one
> item and the list that would normally be returned as the other
> item.

I was also thinking along these lines, but the return value should always be of the same type, otherwise code will always have to handle both cases, making it more complicated, or risk raising errors. Also keep in mind you can also have multidimensional arrays with more than one start index.

I'd rather return a list subtype with the start index as an additional attribute. However, the question is then how that list should behave when retrieving items. Since this is not obvious, we should make it customizable.

So the idea is that we provide a function for changing the base Python class used for PG arrays, which should be a subclass of list. The array parser would then only set an additional "lower" attribute in instances of that class, and it's up to the class implementation how this is handled when items of the array are returned.

Would that be reasonable solution?

-- Christoph
_______________________________________________
PyGreSQL mailing list
PyGreSQL@vex.net
https://mail.vex.net/mailman/listinfo.cgi/pygresql

Reply via email to