Re: [HACKERS] CUDA Sorting

2011-09-27 Thread Vitor Reus
Hey hackers,

I'm still having problems reading the values of the columns in tuplesort.c,
in order to understand how to port this to CUDA.

Should I use the heap_getattr macro to read them?

2011/9/24 Hannu Krosing

 On Mon, 2011-09-19 at 10:36 -0400, Greg Smith wrote:
  On 09/19/2011 10:12 AM, Greg Stark wrote:
   With the GPU I'm curious to see how well
   it handles multiple processes contending for resources, it might be a
   flashy feature that gets lots of attention but might not really be
   very useful in practice. But it would be very interesting to see.
  The main problem here is that the sort of hardware commonly used for
  production database servers doesn't have any serious enough GPU to
  support CUDA/OpenCL available.  The very clear trend now is that all
  systems other than gaming ones ship with motherboard graphics chipsets
  more than powerful enough for any task but that.  I just checked the 5
  most popular configurations of server I see my customers deploy
  PostgreSQL onto (a mix of Dell and HP units), and you don't get a
  serious GPU from any of them.
  Intel's next generation Ivy Bridge chipset, expected for the spring of
  2012, is going to add support for OpenCL to the built-in motherboard
  GPU.  We may eventually see that trickle into the server hardware side
  of things too.
  I've never seen a PostgreSQL server capable of running CUDA, and I don't
  expect that to change.

 CUDA sorting could be beneficial on general server hardware if it can
 run well on multiple cpus in parallel. GPU-s being in essence parallel
 processors on fast shared memory, it may be that even on ordinary RAM
 and lots of CPUs some CUDA algorithms are a significant win.

 and then there is non-graphics GPU availabe on EC2

  Cluster GPU Quadruple Extra Large Instance

  22 GB of memory
  33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem”
  2 x NVIDIA Tesla “Fermi” M2050 GPUs
  1690 GB of instance storage
  64-bit platform
  I/O Performance: Very High (10 Gigabit Ethernet)
  API name: cg1.4xlarge

 It costs $2.10 per hour, probably a lot less if you use the Spot

  Greg Smith   2ndQuadrant   Baltimore, MD
  PostgreSQL Training, Services, and 24x7 Support

 Sent via pgsql-hackers mailing list (
 To make changes to your subscription:


2011-09-19 Thread Vitor Reus
Hello everyone,

I'm implementing a CUDA based sorting on PostgreSQL, and I believe it
can improve the ORDER BY statement performance in 4 to 10 times. I
already have a generic CUDA sort that performs around 10 times faster
than std qsort. I also managed to load CUDA into pgsql.

Since I'm new to pgsql development, I replaced the code of pgsql
qsort_arg to get used with the way postgres does the sort. The problem
is that I can't use the qsort_arg_comparator comparator function on
GPU, I need to implement my own. I didn't find out how to access the
sorting key value data of the tuples on the Tuplesortstate or
SortTuple structures. This part looks complicated because it seems the
state holds the pointer for the scanner(?), but I didn't managed to
access the values directly. Can anyone tell me how this works?


Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Re: [HACKERS] CUDA Sorting

2011-09-19 Thread Vitor Reus
2011/9/19 Thom Brown
 Is your aim to have this committed into core PostgreSQL, or just for
 your own version?  If it's the former, I don't anticipate any
 enthusiasm from the hacker community.

This is a research thesis and I'm not confident to commit it on the
core just by myself. I will, however, release the source, and I
believe it will open the way to future work be committed on core

2011/9/19 Greg Stark
 Of course that could change if adding a GPU would help Postgres... I
 would expect it to help mostly for data warehouse batch query type
 systems, especially ones with very large i/o subsystems that can
 saturate the memory bus with sequential i/o. Run your large batch
 queries twice as fast by adding a $400 part to your $40,000 server
 might be a pretty compelling sales pitch :)

My focus is also energy proportionality. If you add a GPU, you will
increase the power consumption in about 2 times, but perhaps could
increse the efficiency much more.

 That said, to help in the case I described you would have to implement
 the tapesort algorithm on the GPU as well. I expect someone has
 implemented heaps for CUDA/OpenCL already though.

For now, I'm planning to implement just the in-memory sort, for
simplicity and to see if it would give a real performance gain.

2011/9/19 Greg Stark
 In which case you could call a specialized qsort which
 implements that comparator inlined instead of calling the standard

Actually I'm now trying to make a custom comparator for integers, but
I didn't had great progress. If this works, I'll port it to GPU and
start working with the next comparators, such as float, then strings,
in a incremental way.

2011/9/19 Thom Brown
 Found it!
This is a really great work, and I'm basing mine on it. But it's
implemented using OpenGL (yes, not OpenCL), and therefore has a lot of
limitations. I also tried to contact naju but didn't get any answer.

Vítor Uwe Reus

Sent via pgsql-hackers mailing list (
To make changes to your subscription: