On Wed, Oct 19, 2016 at 7:48 PM, Chris Barker <chris.bar...@noaa.gov> wrote:
> a few thoughts: > > On Wed, Oct 19, 2016 at 12:08 PM, Todd <toddr...@gmail.com> wrote: > >> I have been thinking about how to go about having a multidimensional >> array constructor in python. I know that Python doesn't have a built-in >> multidimensional array class and won't for the foreseeable future. >> > > no but it does have buffers and memoryviews and the extended buffer > protocol supports "strided" data -- i.e. multi-dimensional arrays. So it > would be nice to have SOME simple ndarray object in the standard library > that would wrap such buffers -- it would be nice for working with image > data, interacting with numpy arrays, etc. > > The "trick" is that once you have the container, you want some > functionality -- so you add indexing and slicing -- natch. Then maybe some > simple math? then.... eventually, you are trying to put all of numpy into > the stdlib, and we already know we don't want to do that. > > Though I still think a simple container that only supports indexing and > slicing would be lovely. > > That all being said: > > a = [| 0, 1, 2 || 3, 4, 5 |] >> > > I really don't see the advantage of that over: > > a = [[0, 1, 2],[3, 4, 5]] > > really I don't -- and I'm a heavy numpy user, so I write a lot of those! > > If there is a problem with the current options (and I'm not convinced > there is) it's that it in'st a literal for multidimensional array, but > rather a literal for a bunch of nested lists -- the list themselves are > created, and so are all the "boxed" values in the array -- only to be > pulled out and unboxed to be put in the array. > > But as you said, that is not a multidimensional array. We aren't comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = [[0, 1, 2],[3, 4, 5]]", we are comparing "a = [| 0, 1, 2 || 3, 4, 5 |]" and "a = np.array([[0, 1, 2],[3, 4, 5]])". That is a bigger difference. > However, this is only for literals -- if your data are large, then they > are not going to be in literals, but rather read form a file or something, > so this is really not much of a limitation. > Even if your original data is large, I often need smaller areas when processing, for example for broadcasting or as arguments to processing functions. > > However, if you really don't like it, then you can pass a string to > aconfsturctor function instead: > > a = arr_from_string(" | 0, 1, 2 || 3, 4, 5 | ") > > yeah, you need to type the extra quotes, but that's not much. > Then you need an even longer function call. Again, that defeats the purpose of having a literal, which is to make the syntax more concise. > > NOTE: I'm pretty sure numpy has something like this already, for folks > that like the MATLAB style -- though I can't find it at the moment. > It is: r_[[0, 1, 2], [3, 4, 5] But this uses indexing behind the scenes, meaning your data is created as an index then needs to be converted to a list later. This adds considerable overhead. I just tested it and it was somewhere around 20 times slower than "np.array()" in the test. > > b = [| 0, 1, 2 | >> | 3, 4, 5 |] >> > > b = [[ 0, 1, 2 ], > [ 3, 4, 5 ]] > > > No, this is the equivalent of: b = np.array([[ 0, 1, 2 ], [ 3, 4, 5 ]]) The whole point of this is to avoid the "np.array" call. > You can also create a 2D row array by combining the two: >> >> a = [|| 0, 1, 2 ||] >> > > a = [[ 0, 1, 2 ]] or is it: [[[ 0, 1, 2 ]]] > > (I can't tell, so maybe your syntax is not so clear??? > I am not clear where the ambiguity lies? Count the number of "|" symbols. > > >> For higher dimensions, you can just put more lines together: >> >> a = [||| 0, 1, 2 || 3, 4, 5 ||| 6, 7, 8 || 9, 10, 11 |||] >> >> b = [||| 0, 1, 2 >> || 3, 4, 5 >> ||| 6, 7, 8 >> || 9, 10, 11 >> |||] >> > > I have no idea what that means! > ||| is the delimiter for the third dimension, || is the delimiter for the second dimension. It is like how newline is used as a delimeter for the second dimension in CSV files. So it is equivalent to: b = np.array([[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]) > > >> At least in my opinion, this sort of approach really shines when making >> higher-dimensional arrays. These would all be equivalent (the | at the >> beginning and end are just to make it easier to align indentation, they >> aren't required): >> >> a = [|||| 48, 11, 141, 13, -60, -37, 58, -52, -29, 134 >> || -6, 96, -66, 137, -59, -147, -118, -104, -123, -7 >> ||| -103, 50, -89, -12, 28, -12, 119, -131, -73, 21 >> || -58, 105, 25, -138, -106, -118, -29, -49, -63, -56 >> |||| -43, -34, 101, -115, 41, 121, 3, -117, 101, -145 >> || 100, -128, 76, 128, -113, -90, 52, -91, -72, -15 >> ||| 22, -65, -118, 134, -58, 55, -73, -118, -53, -60 >> || -85, -136, 83, -66, -35, -117, -71, 115, -56, 133 >> ||||] >> > > It does seem that you are saving some typing when you have high-dim > arrays, but I really dont see the readability here. > If you are used to counting braces, perhaps. But imagine someone who is just starting out. How do you describe how to determine what dimension is being split? "It is one more than total number of sequential left braces and left parentheses" vs “it is the number of vertical lines". Add to that having to deal with both left and right braces rather than a single delimiter adds a lot of visual noise. There is a reason we use commas rather than, say ">,<" as a delimiter in lists, it is easier to deal with a single kind of symbol rather than three (or potentially five in the current case). > > > but anyway, the way to more this kind of thing forward is to use it as a > new format in an existing lib (like numpy, by passing it as a big string. > IF folks like it and start using it, then there is room for a conversation. > The big problem with that is that having to wrap it as a string and pass it to a function in the numpy namespace loses much of the advantage from having a literal to begin with. > > But I doubt (and I wouldn't support) that anyone would put a literal into > python for an object that doesn't exist in python... > > Yes, I understand that. But some projects are already doing that on their own. I think having a way for them to do it without losing the list constructor (which is the approach currently being taken) would be a benefit.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/