On Wed, May 25, 2016 at 09:10:02AM +0000, Kouhei Kaigai wrote:
> > -----Original Message-----
> > From: Simon Riggs [mailto:si...@2ndquadrant.com]
> > Sent: Wednesday, May 25, 2016 4:39 PM
> > To: Kaigai Kouhei(海外 浩平)
> > Cc: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Does people favor to have matrix data type?
> > 
> > On 25 May 2016 at 03:52, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote:
> > 
> > 
> >     In a few days, I'm working for a data type that represents matrix in
> >     mathematical area. Does people favor to have this data type in the core,
> >     not only my extension?
> > 
> > 
> > If we understood the use case, it might help understand whether to include 
> > it or not.
> > 
> > Multi-dimensionality of arrays isn't always useful, so this could be good.
> >
> As you may expect, the reason why I've worked for matrix data type is one of
> the groundwork for GPU acceleration, but not limited to.
> What I tried to do is in-database calculation of some analytic algorithm; not
> exporting entire dataset to client side.
> My first target is k-means clustering; often used to data mining.
> When we categorize N-items which have M-attributes into k-clusters, the master
> data can be shown in NxM matrix; that is equivalent to N vectors in 
> M-dimension.
> The cluster centroid is also located inside of the M-dimension space, so it
> can be shown in kxM matrix; that is equivalent to k vectors in M-dimension.
> The k-means algorithm requires to calculate the distance to any cluster 
> centroid
> for each items, thus, it produces Nxk matrix; that is usually called as 
> distance
> matrix. Next, it updates the cluster centroid using the distance matrix, then
> repeat the entire process until convergence.
> The heart of workload is calculation of distance matrix. When I tried to write
> k-means algorithm using SQL + R, its performance was not sufficient (poor).
>   https://github.com/kaigai/toybox/blob/master/Rstat/pgsql-kmeans.r
> If we would have native functions we can use instead of the complicated SQL
> expression, it will make sense for people who tries in-database analytics.
> Also, fortunately, PostgreSQL's 2-D array format is binary compatible to BLAS
> library's requirement. It will allow GPU to process large matrix in HPC grade
> performance.
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kai...@ak.jp.nec.com>


Have you looked at Perl Data Language under pl/perl? It has pretty nice support
for matrix calculations:



Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to