Re: [HACKERS] Does people favor to have matrix data type?

Jim Nasby Wed, 01 Jun 2016 07:32:49 -0700

On 5/30/16 9:05 PM, Kouhei Kaigai wrote:

Due to performance reason, location of each element must be deterministic
without walking on the data structure. This approach guarantees we can
reach individual element with 2 steps.


Agreed.

On various other points...

Yes, please keep the discussion here, even when it relates only to PL/R.Whatever is being done for R needs to be done for plpython as well. I'velooked at ways to improve analytics in plpython related to this, and itlooks like I need to take a look at the fast-path function stuff. One ofthe things I've pondered for storing ndarrays in Postgres is how toreduce or eliminate the need to copy data from one memory region toanother. It would be nice if there was a way to take memory that wasallocated by one manager (ie: python's) and transfer ownership of thatmemory directly to Postgres without having to copy everything. Obviouslyyou'd want to go the other way as well. IIRC cython's memory manager isthe same as palloc in regard to very large allocations basically beingignored completely, so this should be possible in that case.

One thing I don't understand is why this type needs to be limited to 1or 2 dimensions? Isn't the important thing how many individual elementsyou can fit into GPU? So if you can fit a 1024x1024, you could also fita 100x100x100, a 32x32x32x32, etc. At low enough values maybe that stopsmaking sense, but I don't see why there needs to be an artificial limit.I think what's important for something like kNN is that the storage isoptimized for this, which I think means treating the highest dimensionas if it was a list. I don't know if it then matters whither the lowerdimensions are C style vs FORTRAN style. Other algorithms might wantdifferent storage.

Something else to consider is the 1G toast limit. I'm pretty sure that'swhy MADlib stores matricies as a table of vectors. I know for certainit's a problem they run into, because they've discussed it on theirmailing list.

BTW, take a look at MADlib svec[1]... ISTM that's just a special case ofwhat you're describing with entire grids being zero (or vice-versa).There might be some commonality there.


[1] https://madlib.incubator.apache.org/docs/v1.8/group__grp__svec.html
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Does people favor to have matrix data type?

Reply via email to