On Wed, May 25, 2016 at 09:10:02AM +0000, Kouhei Kaigai wrote: > > -----Original Message----- > > From: Simon Riggs [mailto:si...@2ndquadrant.com] > > Sent: Wednesday, May 25, 2016 4:39 PM > > To: Kaigai Kouhei(海外 浩平) > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Does people favor to have matrix data type? > > > > On 25 May 2016 at 03:52, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote: > > > > > > In a few days, I'm working for a data type that represents matrix in > > mathematical area. Does people favor to have this data type in the core, > > not only my extension? > > > > > > If we understood the use case, it might help understand whether to include > > it or not. > > > > Multi-dimensionality of arrays isn't always useful, so this could be good. > > > As you may expect, the reason why I've worked for matrix data type is one of > the groundwork for GPU acceleration, but not limited to. > > What I tried to do is in-database calculation of some analytic algorithm; not > exporting entire dataset to client side. > My first target is k-means clustering; often used to data mining. > When we categorize N-items which have M-attributes into k-clusters, the master > data can be shown in NxM matrix; that is equivalent to N vectors in > M-dimension. > The cluster centroid is also located inside of the M-dimension space, so it > can be shown in kxM matrix; that is equivalent to k vectors in M-dimension. > The k-means algorithm requires to calculate the distance to any cluster > centroid > for each items, thus, it produces Nxk matrix; that is usually called as > distance > matrix. Next, it updates the cluster centroid using the distance matrix, then > repeat the entire process until convergence. > > The heart of workload is calculation of distance matrix. When I tried to write > k-means algorithm using SQL + R, its performance was not sufficient (poor). > https://github.com/kaigai/toybox/blob/master/Rstat/pgsql-kmeans.r > > If we would have native functions we can use instead of the complicated SQL > expression, it will make sense for people who tries in-database analytics. > > Also, fortunately, PostgreSQL's 2-D array format is binary compatible to BLAS > library's requirement. It will allow GPU to process large matrix in HPC grade > performance. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kai...@ak.jp.nec.com>
Hi, Have you looked at Perl Data Language under pl/perl? It has pretty nice support for matrix calculations: http://pdl.perl.org Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers