Re: [HACKERS] Does people favor to have matrix data type?

Kouhei Kaigai Sun, 29 May 2016 16:57:47 -0700

> On 05/28/2016 03:33 PM, Kouhei Kaigai wrote:
> >> -----Original Message-----
> >> From: Joe Conway [mailto:[email protected]]
> >> Sent: Sunday, May 29, 2016 1:40 AM
> >> To: Kaigai Kouhei(海外 浩平); Jim Nasby; Ants Aasma; Simon Riggs
> >> Cc: [email protected]
> >> Subject: Re: [HACKERS] Does people favor to have matrix data type?
> >>
> >> On 05/28/2016 07:12 AM, Kouhei Kaigai wrote:
> >>> Sparse matrix! It is a disadvantaged area for the current array format.
> >>>
> >>> I have two ideas. HPC folks often split a large matrix into multiple
> >>> grid. A grid is typically up to 1024x1024 matrix, for example.
> >>> If a grid is consists of all zero elements, it is obvious we don't need
> >>> to have individual elements on the grid.
> >>> One other idea is compression. If most of matrix is zero, it is an ideal
> >>> data for compression, and it is easy to reconstruct only when calculation.
> >>>
> >>>> Related to this, Tom has mentioned in the past that perhaps we should
> >>>> support abstract use of the [] construct. Currently point finds a way to
> >>>> make use of [], but I think that's actually coded into the grammar.
> >>>>
> >>> Yep, if we consider 2D-array is matrix, no special enhancement is needed
> >>> to use []. However, I'm inclined to have own data structure for matrix
> >>> to present the sparse matrix.
> >>
> >> +1 I'm sure this would be useful for PL/R as well.
> >>
> >> Joe
> >>
> > It is pretty good idea to combine PL/R and PL/CUDA (what I'm now working)
> > for advanced analytics. We will be able to off-load heavy computing portion
> > to GPU, then also utilize various R functions inside database.
> 
> Agreed. Perhaps at some point we should discuss closer integration of
> some sort, or at least a sample use case.
>
What I'm trying to implement first is k-means clustering by GPU. It core 
workload
is iteration of massive distance calculations. When I run kmeans() function of R
for million items with 10 clusters on 40 dimensions, it took about thousand 
seconds.
If GPU version provides the result matrix more rapidly, then I expect R can plot
relationship between items and clusters in human friendly way.


For the closer integration, it may be valuable if PL/R and PL/CUDA can exchange
the data structure with no serialization/de-serialization when PL/R code tries
to call SQL functions. IIUC, pg.spi.exec("SELECT my_function(...)") is the only
way to call SQL functions inside PL/R scripts.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <[email protected]>

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Does people favor to have matrix data type?

Reply via email to