I was directed to post this request to the general Python development community 
so hopefully this is on topic.

One of the weaknesses of the PyUnicode implementation is that the type is 
concrete and there is no option for an abstract proxy string to a foreign 
source.  This is an issue for an API like JPype in which java.lang.Strings are 
passed back from Java.   Ideally these would be a type derived from the Unicode 
type str, but that requires transferring the memory immediately from Java to 
Python even when that handle is large and will never be accessed from within 
Python.  For certain operations like XML parsing this can be prohibitable, so 
instead of returning a str we return a JString.   (There is a separate issue 
that Java method names and Python method names conflict so direct inheritance 
creates some problems.)

The JString type can of course be transferred to Python space at any time as 
both Python Unicode and Java string objects are immutable.  However the CPython 
API which takes strings only accepts the Unicode type objects which have a 
concrete implementation.  It is possible to extend strings, but those 
extensions do not allow for proxing as far as I can tell.  Thus there is no 
option currently to proxy to a string representation in another language.  The 
concept of the using the duck type ``__str__`` method is insufficient as this 
indices that an object can become a string, rather than "this object is 
effectively a string" for the purposes of the CPython API.

One way to address this is to use currently outdated copy of READY to extend 
Unicode objects to other languages.  A class like JString would be an unready 
Unicode object which when READY is called transfers the memory from Java, sets 
up the flags and sets up a pointer to the code point representation.  
Unfortunately the READY concept is scheduled for removal and thus the chance to 
address the needs for proxying a Unicode to another languages representation 
may be limited. There may be other methods to accomplish this without using the 
concept of READY.  So long as access to the code points go through the Unicode 
API and the Unicode object can be extended such that the actual code points may 
be located outside of the Unicode object then a proxy can still be achieved if 
there are hooks in it to decided when a transfer should be performed.   
Generally the transfer request only needs to happen once  but the key issue 
being that the number of code points (nor the kind of points) will not be known 
until the memory is transferred.

Java has much the same problem.   Although they defined an interface class 
"java.lang.CharacterArray" the actually "java.lang.String" class is concrete 
and almost all API methods take a String rather than the base interface even 
when the base interface would have been adequate.  Thus just like Python has 
difficulty treating a foreign string class as it would a native one, Java 
cannot treat a Python string as native one as well.  So Python strings get 
represented as CharacterArray type which effectively limits it use greatly.

Summary:


  *   A String proxy would need the address of the memory in the "wstr" slot 
though the code points may be char[], wchar[] or int[] depending the 
representation in the proxy.
  *   API calls to interpret the data would need to check to see if the data is 
transferred first, if not it would call the proxy dependent transfer method 
which is responsible for creating a block of code points and set up flags 
(kind, ascii, ready, and compact).
  *   The memory block allocated would need to call the proxy dependent 
destructor to clean up with the string is done.
  *   It is not clear if this would have impact on performance.   Python 
already has the concept of a string which needs actions before it can be 
accessed, but this is scheduled for removal.

Are there any plans currently to address the concept of a proxy string in 
PyUnicode API?


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BDJAQDPQMVCLCSB3CEM34VPAY666D3M3/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to