[Numpy-discussion] Re: Proposal - Making ndarray object JSON serializable via standardized JData annotations

2021-11-25 Thread Robert Kern
On Thu, Nov 25, 2021 at 11:58 PM Qianqian Fang  wrote:

> from my limited experience, as long as the "TypeError" from json keeps
> popping up, requests like this ( #12481, #16432, #18994,
> pallets/flask#4012, openmm/openmm#3202, zarr-developers/zarr-python#354)
> will unlikely to cease (and maintainers will have to keep on closing with
> "wontfix") - after all, no matter how different the format expectations a
> user may have, seeing some sort of default behavior is still a lot more
> satisfying than seeing an error.
>

I agree. Unfortunately, we have no control over that inside of numpy.

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal - Making ndarray object JSON serializable via standardized JData annotations

2021-11-25 Thread Qianqian Fang

On 11/25/21 23:00, Robert Kern wrote:
We could also provide a JSONEncoder/JSONDecoder pair, too, but as I 
mention in one of the Github issues you link to, there are a number of 
different expectations that people could have for what the JSON 
representation of an array is. Some will want to use the JData 
standard. Others might just want the arrays to be represented as lists 
of lists of plain-old JSON numbers in order to talk with software in 
other languages that have no particular standard for array data.



hi Robert

I agree with you that different users have different expectations, but 
if you want, this can be accommodated by defining slightly different 
(builtin) subclasses of JSONEncoder functions and tell users what to 
expect when using those with cls= , or use one JSONEncoder and 
different parameters.


other projects like pandas.DataFrame handle this via a format parameter 
("orient") to the to_json() function


https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html


from my limited experience, as long as the "TypeError" from json keeps 
popping up, requests like this ( #12481, #16432, #18994, 
pallets/flask#4012, openmm/openmm#3202, zarr-developers/zarr-python#354) 
will unlikely to cease (and maintainers will have to keep on closing 
with "wontfix") - after all, no matter how different the format 
expectations a user may have, seeing some sort of default behavior is 
still a lot more satisfying than seeing an error.



It seems to me that the jdata package is the right place for 
implementing the JData standard. I'm happy for our documentation to 
point to it in all the places that we talk about serialization of 
arrays. If the json module did have some way for us to specify a 
default representation for our objects, then that would be a different 
matter. But for the present circumstances, I'm not seeing a 
substantial benefit to moving this code inside of numpy. Outside of 
numpy, you can evolve the JData standard at its own pace.


--
Robert Kern




I appreciate that you are willing to add this to the documentation, that 
is totally fine - I will just leave the links/resources here in case 
solving this issue becomes a priority in the future.



Qianqian



___
NumPy-Discussion mailing list --numpy-discussion@python.org
To unsubscribe send an email tonumpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address:fan...@gmail.com___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal - Making ndarray object JSON serializable via standardized JData annotations

2021-11-25 Thread Robert Kern
On Thu, Nov 25, 2021 at 10:21 PM Qianqian Fang  wrote:

> On 11/25/21 17:05, Stephan Hoyer wrote:
>
> Hi Qianqian,
>
> What is your concrete proposal for NumPy here?
>
> Are you suggesting new methods or functions like to_json/from_json in
> NumPy itself?
>
>
> that would work - either define a subclass of JSONEncoder to serialize
> ndarray and allow users to pass it to cls in json.dump, or, as you
> mentioned, define to_json/from_json like pandas DataFrame
> 
> would save people from writing customized codes/formats.
>
> I am also wondering if there is a more automated way to tell
> json.dump/dumps to use a default serializer for ndarray without using
> cls=...? I saw a SO post mentioned about a method called "__serialize__" in
> a class, but can't find it in the official doc. I am wondering if anyone is
> aware of the method defining a default json serializer in an object?
>
There isn't one. You have to explicitly provide the JSONEncoder. Which is
why there is nothing that we can really do in numpy to avoid the TypeError
that you mention below. The stdlib json module just doesn't give us the
hooks to be able to do that. We can provide top-level functions like
to_json()/from_json() to encode/decode a top-level ndarray to a JSON text,
but that doesn't help with ndarrays in dicts or other objects. We could
also provide a JSONEncoder/JSONDecoder pair, too, but as I mention in one
of the Github issues you link to, there are a number of different
expectations that people could have for what the JSON representation of an
array is. Some will want to use the JData standard. Others might just want
the arrays to be represented as lists of lists of plain-old JSON numbers in
order to talk with software in other languages that have no particular
standard for array data.

> As far as I can tell, reading/writing in your custom JSON format already
> works with your jdata library.
>
> ideally, I was hoping the small jdata encoder/decoder functions can be
> integrated into numpy; it can help avoid the "TypeError: Object of type
> ndarray is not JSON serializable" in json.dump/dumps without needing
> additional modules; more importantly, it simplifies users experience in
> exchanging complex arrays (complex valued, sparse, special shapes) with
> other programming environments.
>
It seems to me that the jdata package is the right place for implementing
the JData standard. I'm happy for our documentation to point to it in all
the places that we talk about serialization of arrays. If the json module
did have some way for us to specify a default representation for our
objects, then that would be a different matter. But for the present
circumstances, I'm not seeing a substantial benefit to moving this code
inside of numpy. Outside of numpy, you can evolve the JData standard at its
own pace.

-- 
Robert Kern
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal - Making ndarray object JSON serializable via standardized JData annotations

2021-11-25 Thread Qianqian Fang

On 11/25/21 17:05, Stephan Hoyer wrote:

Hi Qianqian,

What is your concrete proposal for NumPy here?

Are you suggesting new methods or functions like to_json/from_json in 
NumPy itself?



that would work - either define a subclass of JSONEncoder to serialize 
ndarray and allow users to pass it to cls in json.dump, or, as you 
mentioned, define to_json/from_json like pandas DataFrame 
 
would save people from writing customized codes/formats.


I am also wondering if there is a more automated way to tell 
json.dump/dumps to use a default serializer for ndarray without using 
cls=...? I saw a SO post mentioned about a method called "__serialize__" 
in a class, but can't find it in the official doc. I am wondering if 
anyone is aware of the method defining a default json serializer in an 
object?



As far as I can tell, reading/writing in your custom JSON format 
already works with your jdata library.



ideally, I was hoping the small jdata encoder/decoder functions can be 
integrated into numpy; it can help avoid the "TypeError: Object of type 
ndarray is not JSON serializable" in json.dump/dumps without needing 
additional modules; more importantly, it simplifies users experience in 
exchanging complex arrays (complex valued, sparse, special shapes) with 
other programming environments.


Qianqian




Best,
Stephan

On Thu, Nov 25, 2021 at 2:35 PM Qianqian Fang  wrote:

Dear numpy developers,

I would like to share a proposal on making ndarray JSON
serializable by default, as detailed in this github issue:

https://github.com/numpy/numpy/issues/20461


briefly, my group and collaborators are working on a new NIH
(National Institute of Health) funded initiative - NeuroJSON
(http://neurojson.org) - to further disseminate a lightweight data
annotation specification (JData
)
among the broad neuroimaging/scientific community. Python and
numpy have been widely used
 in neuroimaging
data analysis pipelines (nipy, nibabel, mne-python, PySurfer ...
), because N-D array is THE most important data structure used in
scientific data. However, numpy currently does not support JSON
serialization by default. This is one of the frequently requested
features on github (#16432, #12481).

We have developed a lightweight python modules (jdata
, bjdata
) to help export/import ndarray
objects to/from JSON (and a binary JSON format - BJData

/UBJSON
 - to gain efficiency). The approach is to
convert ndarray objects to a dictionary  with subfields using
standardized JData annotation tags. The JData spec can serialize
complex data structures such as N-D arrays (solid, sparse,
complex). trees, graphs, tables etc. It also permits data
compression. These annotations have been implemented in my MATLAB
toolbox - JSONLab  - since 2011
to help import/export MATLAB data types, and have been broadly
used among MATLAB/GNU Octave users.

Examples of these portable JSON annotation tags representing N-D
arrays can be found at


http://openjdata.org/wiki/index.cgi?JData/Examples/Basic#2_D_arrays_in_the_annotated_format
http://openjdata.org/wiki/index.cgi?JData/Examples/Advanced

and the detailed formats on N-D array annotations can be found in
the spec:


https://github.com/NeuroJSON/jdata/blob/master/JData_specification.md#annotated-storage-of-n-d-arrays


our current python module to encode/decode ndarray to JSON
serializable forms are implemented in these compact functions
(handling lossless type/data conversion and data compression)


https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L72-L97

https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L126-L160

We strongly believe that enabling JSON serialization by default
will benefit the numpy user community, making it a lot easier to
share complex data between platforms
(MATLAB/Python/C/FORTRAN/JavaScript...) via a
standardized/NIH-backed data annotation scheme.

We are happy to hear your thoughts, suggestions on how to
contribute, and also glad to set up dedicated discussions.

Cheers

Qianqian

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: sho...@gmail.com



[Numpy-discussion] Re: Allow for callable in indexing

2021-11-25 Thread Juan Nunez-Iglesias
I have to say I like this! Together with partial functions / toolz currying [1] 
it could make for some rather elegant code:

result = gaussian_filter(image)[greater_than(t)]

Juan.

..[1]: https://toolz.readthedocs.io/en/latest/curry.html

On Wed, 24 Nov 2021, at 9:37 AM, cameron.pinne...@gmail.com wrote:
> If you have an array built up out of method chaining, sometimes you 
> need to filter it at the very end. This can be annoying because it 
> means you have to create a temporary variable just so you can refer to 
> it in the indexing square brackets:
>
> _temp = long_and_complicated_expression()
> result = _temp[_temp >= 0]
>
> You could also use the walrus operator but this is odd looking and it 
> still pollutes the namespace:
>
> result = (_temp := long_and_complicated_expression())[_temp >= 0]
>
> What I would like is to be able to use a lambda inside the indexing 
> square brackets, which would take the whole array as an argument and 
> give a boolean array:
>
> result = long_and_complicated_expression()[lambda arr: arr >= 0]
>
> I should emphasize, the lambda gets the entire array as its argument, 
> and returns an entire mask array of bools. It isn't like the `map` and 
> `filter` builtins where it would call the python function once for each 
> element and thus be slow.
>
> Pandas already has something similar[1]; you can pass a lambda into 
> `.loc[]` that takes a Series and returns a boolean indexer.
>
> [1] 
> https://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: j...@fastmail.com
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Proposal - Making ndarray object JSON serializable via standardized JData annotations

2021-11-25 Thread Stephan Hoyer
Hi Qianqian,

What is your concrete proposal for NumPy here?

Are you suggesting new methods or functions like to_json/from_json in NumPy
itself? As far as I can tell, reading/writing in your custom JSON format
already works with your jdata library.

Best,
Stephan

On Thu, Nov 25, 2021 at 2:35 PM Qianqian Fang  wrote:

> Dear numpy developers,
>
> I would like to share a proposal on making ndarray JSON serializable by
> default, as detailed in this github issue:
>
> https://github.com/numpy/numpy/issues/20461
>
>
> briefly, my group and collaborators are working on a new NIH (National
> Institute of Health) funded initiative - NeuroJSON (http://neurojson.org)
> - to further disseminate a lightweight data annotation specification (
> JData
> )
> among the broad neuroimaging/scientific community. Python and numpy have
> been widely used  in
> neuroimaging data analysis pipelines (nipy, nibabel, mne-python, PySurfer
> ... ), because N-D array is THE most important data structure used in
> scientific data. However, numpy currently does not support JSON
> serialization by default. This is one of the frequently requested features
> on github (#16432, #12481).
>
> We have developed a lightweight python modules (jdata
> , bjdata
> ) to help export/import ndarray objects
> to/from JSON (and a binary JSON format - BJData
> 
> /UBJSON  - to gain efficiency). The approach is to
> convert ndarray objects to a dictionary  with subfields using standardized
> JData annotation tags. The JData spec can serialize complex data structures
> such as N-D arrays (solid, sparse, complex). trees, graphs, tables etc. It
> also permits data compression. These annotations have been implemented in
> my MATLAB toolbox - JSONLab  - since
> 2011 to help import/export MATLAB data types, and have been broadly used
> among MATLAB/GNU Octave users.
>
> Examples of these portable JSON annotation tags representing N-D arrays
> can be found at
>
>
> http://openjdata.org/wiki/index.cgi?JData/Examples/Basic#2_D_arrays_in_the_annotated_format
> http://openjdata.org/wiki/index.cgi?JData/Examples/Advanced
>
> and the detailed formats on N-D array annotations can be found in the spec:
>
>
> https://github.com/NeuroJSON/jdata/blob/master/JData_specification.md#annotated-storage-of-n-d-arrays
>
>
> our current python module to encode/decode ndarray to JSON serializable
> forms are implemented in these compact functions (handling lossless
> type/data conversion and data compression)
>
>
> https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L72-L97
>
> https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L126-L160
>
> We strongly believe that enabling JSON serialization by default will
> benefit the numpy user community, making it a lot easier to share complex
> data between platforms (MATLAB/Python/C/FORTRAN/JavaScript...) via a
> standardized/NIH-backed data annotation scheme.
>
> We are happy to hear your thoughts, suggestions on how to contribute, and
> also glad to set up dedicated discussions.
>
> Cheers
>
> Qianqian
> ___
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sho...@gmail.com
>
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Proposal - Making ndarray object JSON serializable via standardized JData annotations

2021-11-25 Thread Qianqian Fang

Dear numpy developers,

I would like to share a proposal on making ndarray JSON serializable by 
default, as detailed in this github issue:


https://github.com/numpy/numpy/issues/20461


briefly, my group and collaborators are working on a new NIH (National 
Institute of Health) funded initiative - NeuroJSON 
(http://neurojson.org) - to further disseminate a lightweight data 
annotation specification (JData 
) 
among the broad neuroimaging/scientific community. Python and numpy have 
been widely used  in 
neuroimaging data analysis pipelines (nipy, nibabel, mne-python, 
PySurfer ... ), because N-D array is THE most important data structure 
used in scientific data. However, numpy currently does not support JSON 
serialization by default. This is one of the frequently requested 
features on github (#16432, #12481).


We have developed a lightweight python modules (jdata 
, bjdata 
) to help export/import ndarray 
objects to/from JSON (and a binary JSON format - BJData 
/UBJSON 
 - to gain efficiency). The approach is to convert 
ndarray objects to a dictionary  with subfields using standardized JData 
annotation tags. The JData spec can serialize complex data structures 
such as N-D arrays (solid, sparse, complex). trees, graphs, tables etc. 
It also permits data compression. These annotations have been 
implemented in my MATLAB toolbox - JSONLab 
 - since 2011 to help import/export 
MATLAB data types, and have been broadly used among MATLAB/GNU Octave users.


Examples of these portable JSON annotation tags representing N-D arrays 
can be found at


http://openjdata.org/wiki/index.cgi?JData/Examples/Basic#2_D_arrays_in_the_annotated_format
http://openjdata.org/wiki/index.cgi?JData/Examples/Advanced

and the detailed formats on N-D array annotations can be found in the spec:

https://github.com/NeuroJSON/jdata/blob/master/JData_specification.md#annotated-storage-of-n-d-arrays


our current python module to encode/decode ndarray to JSON serializable 
forms are implemented in these compact functions (handling lossless 
type/data conversion and data compression)


https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L72-L97
https://github.com/NeuroJSON/pyjdata/blob/63301d41c7b97fc678fa0ab0829f76c762a16354/jdata/jdata.py#L126-L160

We strongly believe that enabling JSON serialization by default will 
benefit the numpy user community, making it a lot easier to share 
complex data between platforms (MATLAB/Python/C/FORTRAN/JavaScript...) 
via a standardized/NIH-backed data annotation scheme.


We are happy to hear your thoughts, suggestions on how to contribute, 
and also glad to set up dedicated discussions.


Cheers

Qianqian
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com