[Python-ideas] Re: Faster object representation for UIs

Paul Sokolovsky Sat, 25 Jul 2020 07:24:21 -0700

Hello,

On Sat, 25 Jul 2020 16:34:16 +0300
Elizabeth Shashkova <elizabeth.shashk...@gmail.com> wrote:


> Hi!
> 
> Thanks everyone for the interest and for the suggested options! I
> would like ro add my two coins and clarify some moments as the
> original requester of this feature.
> 
> 1. We need this lazy `__repr__` calculation inside our debugger,
> where we work with different user's objects.

Did you consider that calling a __repr__ on an object may start
formatting your (or user's) hard drive (arbitrary code is executed)?
Alternatively, that may take a long time (maybe it mines bitcoin on
each call), as you discovered.

So, the behavior of calling __repr__ arbitrarily doesn't seem to be a
good approach for a *generic* Python debugger. Then probably you don't
write a generic Python debugger, but an adhoc, special-purpose one.
And the problems you're facing is with specific object types, and it
may be a good idea to contact projects which supply those types and
tell them that you special-purpose debugger has problems with them. I
still wonder what chance may be that the answer would be "don't call
__repr__ then".
 


Speaking of technical side, it's indeed sad at times that Python's
__str__/__repr__ require to materialize entire representation in
memory. More frugal approach would be:

def __stream_repr__(self, stream):
    stream.write("<MyObj ")
    # Produce the rest of representation piece-wise, calling
    # stream.write()
    stream.write(">")

This can be a great economy of memory, for implementations which care
about that (is that CPython?).

That actually could provide means to address the posed problem, as
"stream" can be a custom stream object, whose .write() method checks
e.g. representation size or time budget, and if it's exceeded, throws
an exception, which should be called at code which calls repr() in
the first place.


Other alternative could be turning __repr__ into generator (i.e.,
introducing __irepr__), but that's quite an expensive solution, given
that generator instance need to be heap-allocated on each call, before
it can be iterated over.



> Usually it isn't some
> specific type, for which you know that it'll be big and its
> `__repr__` calculation will be slow (like, for example,
> pandas.DataFrame). Sometimes it can be just a composition of builtin
> types, like in the example below. On the top level it's just a `dict`
> with 10 elements and you don't expect its `repr()` to be slow, but it
> takes 13 secs on my machine to calculate it.
> ```
> import time
> 
> def build_data_object():
>     data = dict()
>     for i in range(10):
>         temp_dict = dict()
>         for j in range(10):
>             temp_dict[str(j)] = "a" * 30000000
>         data[str(i)] = temp_dict
>     return data
> 
> obj = build_data_object()
> start = time.time()
> repr(obj)
> finish = time.time()
> print("Time: %.2f" % (finish - start))
> ```
> 
> 2. I also agree it isn't the best idea to add additional parameters to
> `repr` or `str`. Just a function like `lazy_repr` implemented in
> stdlib will be already very useful.
> 
> 3. But I also believe this issue can't be solved without changes in
> the language or stdlib, because you can't predict the length of
> `repr` for an object of unknown type without calculation of the whole
> string. But I hope it should be possible to check current buffer size
> during `__repr__` generation and interrupt it when it reaches the
> limit. (sorry, I'm not a CPython developer and I might be too naive
> here, so correct me if I'm wrong).
> 
> 
> Elizaveta Shashkova.
> 
> сб, 25 июл. 2020 г. в 13:27, Serhiy Storchaka <storch...@gmail.com>:
> 
> > 24.07.20 18:10, Gábor Bernát пише:  
> > > I would like to have a lazy repr evaluation for the objects!
> > > Sometimes  
> > users have many really large objects, and when debugger is trying
> > to show them in Variables View (=show their string representation)
> > it can takes a lot of time. We do some tricks, but they not always
> > work. It would be really-really cool to have parameter in repr,
> > which defines max number of symbols we want to evaluate during repr
> > for this object.  
> > > Maybe repr is not the best here, because that should be
> > > interpreter  
> > meaningful, but instead the __str__ method that's better for this.
> > Maybe we could pass in an optional limit argument to these methods,
> > so that the user can decide what to print depending on how many
> > characters he has left?  
> > >
> > > Any takes, better ideas how we could help this problem?  
> >
> > We need a structural repr protocol, which would represent complex
> > object as a structure containing items and attributes, so pprint()
> > would know how to format a multiline text representation, and
> > graphic tools could represent objects as a tree, with collapsed by
> > default deep children and long sequences which can be expanded
> > interactively. It was discussed in the past, but we still do not
> > have good specification of such protocol.
> > _______________________________________________ Python-ideas
> > mailing list -- python-ideas@python.org To unsubscribe send an
> > email to python-ideas-le...@python.org
> > https://mail.python.org/mailman3/lists/python-ideas.python.org/
> > Message archived at
> > https://mail.python.org/archives/list/python-ideas@python.org/message/SDW6AHHOLJOTLEIWH7DUU5HNQRJ3U7UM/
> > Code of Conduct: http://python.org/psf/codeofconduct/ 



-- 
Best regards,
 Paul                          mailto:pmis...@gmail.com
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4ZBBAZIBS4I6UGOBECNA3RCUUOKJQ3XJ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Faster object representation for UIs

Reply via email to