> On 05/28/2016 03:33 PM, Kouhei Kaigai wrote: > >> -----Original Message----- > >> From: Joe Conway [mailto:m...@joeconway.com] > >> Sent: Sunday, May 29, 2016 1:40 AM > >> To: Kaigai Kouhei(海外 浩平); Jim Nasby; Ants Aasma; Simon Riggs > >> Cc: pgsql-hackers@postgresql.org > >> Subject: Re: [HACKERS] Does people favor to have matrix data type? > >> > >> On 05/28/2016 07:12 AM, Kouhei Kaigai wrote: > >>> Sparse matrix! It is a disadvantaged area for the current array format. > >>> > >>> I have two ideas. HPC folks often split a large matrix into multiple > >>> grid. A grid is typically up to 1024x1024 matrix, for example. > >>> If a grid is consists of all zero elements, it is obvious we don't need > >>> to have individual elements on the grid. > >>> One other idea is compression. If most of matrix is zero, it is an ideal > >>> data for compression, and it is easy to reconstruct only when calculation. > >>> > >>>> Related to this, Tom has mentioned in the past that perhaps we should > >>>> support abstract use of the [] construct. Currently point finds a way to > >>>> make use of [], but I think that's actually coded into the grammar. > >>>> > >>> Yep, if we consider 2D-array is matrix, no special enhancement is needed > >>> to use []. However, I'm inclined to have own data structure for matrix > >>> to present the sparse matrix. > >> > >> +1 I'm sure this would be useful for PL/R as well. > >> > >> Joe > >> > > It is pretty good idea to combine PL/R and PL/CUDA (what I'm now working) > > for advanced analytics. We will be able to off-load heavy computing portion > > to GPU, then also utilize various R functions inside database. > > Agreed. Perhaps at some point we should discuss closer integration of > some sort, or at least a sample use case. > What I'm trying to implement first is k-means clustering by GPU. It core workload is iteration of massive distance calculations. When I run kmeans() function of R for million items with 10 clusters on 40 dimensions, it took about thousand seconds. If GPU version provides the result matrix more rapidly, then I expect R can plot relationship between items and clusters in human friendly way.
For the closer integration, it may be valuable if PL/R and PL/CUDA can exchange the data structure with no serialization/de-serialization when PL/R code tries to call SQL functions. IIUC, pg.spi.exec("SELECT my_function(...)") is the only way to call SQL functions inside PL/R scripts. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers