> -----Original Message----- > From: Simon Riggs [mailto:si...@2ndquadrant.com] > Sent: Wednesday, May 25, 2016 4:39 PM > To: Kaigai Kouhei(海外 浩平) > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Does people favor to have matrix data type? > > On 25 May 2016 at 03:52, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote: > > > In a few days, I'm working for a data type that represents matrix in > mathematical area. Does people favor to have this data type in the core, > not only my extension? > > > If we understood the use case, it might help understand whether to include it > or not. > > Multi-dimensionality of arrays isn't always useful, so this could be good. > As you may expect, the reason why I've worked for matrix data type is one of the groundwork for GPU acceleration, but not limited to.
What I tried to do is in-database calculation of some analytic algorithm; not exporting entire dataset to client side. My first target is k-means clustering; often used to data mining. When we categorize N-items which have M-attributes into k-clusters, the master data can be shown in NxM matrix; that is equivalent to N vectors in M-dimension. The cluster centroid is also located inside of the M-dimension space, so it can be shown in kxM matrix; that is equivalent to k vectors in M-dimension. The k-means algorithm requires to calculate the distance to any cluster centroid for each items, thus, it produces Nxk matrix; that is usually called as distance matrix. Next, it updates the cluster centroid using the distance matrix, then repeat the entire process until convergence. The heart of workload is calculation of distance matrix. When I tried to write k-means algorithm using SQL + R, its performance was not sufficient (poor). https://github.com/kaigai/toybox/blob/master/Rstat/pgsql-kmeans.r If we would have native functions we can use instead of the complicated SQL expression, it will make sense for people who tries in-database analytics. Also, fortunately, PostgreSQL's 2-D array format is binary compatible to BLAS library's requirement. It will allow GPU to process large matrix in HPC grade performance. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers