Re: [HACKERS] CUDA Sorting
Hey hackers, I'm still having problems reading the values of the columns in tuplesort.c, in order to understand how to port this to CUDA. Should I use the heap_getattr macro to read them? 2011/9/24 Hannu Krosing ha...@krosing.net On Mon, 2011-09-19 at 10:36 -0400, Greg Smith wrote: On 09/19/2011 10:12 AM, Greg Stark wrote: With the GPU I'm curious to see how well it handles multiple processes contending for resources, it might be a flashy feature that gets lots of attention but might not really be very useful in practice. But it would be very interesting to see. The main problem here is that the sort of hardware commonly used for production database servers doesn't have any serious enough GPU to support CUDA/OpenCL available. The very clear trend now is that all systems other than gaming ones ship with motherboard graphics chipsets more than powerful enough for any task but that. I just checked the 5 most popular configurations of server I see my customers deploy PostgreSQL onto (a mix of Dell and HP units), and you don't get a serious GPU from any of them. Intel's next generation Ivy Bridge chipset, expected for the spring of 2012, is going to add support for OpenCL to the built-in motherboard GPU. We may eventually see that trickle into the server hardware side of things too. I've never seen a PostgreSQL server capable of running CUDA, and I don't expect that to change. CUDA sorting could be beneficial on general server hardware if it can run well on multiple cpus in parallel. GPU-s being in essence parallel processors on fast shared memory, it may be that even on ordinary RAM and lots of CPUs some CUDA algorithms are a significant win. and then there is non-graphics GPU availabe on EC2 Cluster GPU Quadruple Extra Large Instance 22 GB of memory 33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core “Nehalem” architecture) 2 x NVIDIA Tesla “Fermi” M2050 GPUs 1690 GB of instance storage 64-bit platform I/O Performance: Very High (10 Gigabit Ethernet) API name: cg1.4xlarge It costs $2.10 per hour, probably a lot less if you use the Spot Instances. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CUDA Sorting
Hello everyone, I'm implementing a CUDA based sorting on PostgreSQL, and I believe it can improve the ORDER BY statement performance in 4 to 10 times. I already have a generic CUDA sort that performs around 10 times faster than std qsort. I also managed to load CUDA into pgsql. Since I'm new to pgsql development, I replaced the code of pgsql qsort_arg to get used with the way postgres does the sort. The problem is that I can't use the qsort_arg_comparator comparator function on GPU, I need to implement my own. I didn't find out how to access the sorting key value data of the tuples on the Tuplesortstate or SortTuple structures. This part looks complicated because it seems the state holds the pointer for the scanner(?), but I didn't managed to access the values directly. Can anyone tell me how this works? Cheers, Vítor -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CUDA Sorting
2011/9/19 Thom Brown t...@linux.com Is your aim to have this committed into core PostgreSQL, or just for your own version? If it's the former, I don't anticipate any enthusiasm from the hacker community. This is a research thesis and I'm not confident to commit it on the core just by myself. I will, however, release the source, and I believe it will open the way to future work be committed on core PostgreSQL. 2011/9/19 Greg Stark st...@mit.edu Of course that could change if adding a GPU would help Postgres... I would expect it to help mostly for data warehouse batch query type systems, especially ones with very large i/o subsystems that can saturate the memory bus with sequential i/o. Run your large batch queries twice as fast by adding a $400 part to your $40,000 server might be a pretty compelling sales pitch :) My focus is also energy proportionality. If you add a GPU, you will increase the power consumption in about 2 times, but perhaps could increse the efficiency much more. That said, to help in the case I described you would have to implement the tapesort algorithm on the GPU as well. I expect someone has implemented heaps for CUDA/OpenCL already though. For now, I'm planning to implement just the in-memory sort, for simplicity and to see if it would give a real performance gain. 2011/9/19 Greg Stark st...@mit.edu: In which case you could call a specialized qsort which implements that comparator inlined instead of calling the standard function. Actually I'm now trying to make a custom comparator for integers, but I didn't had great progress. If this works, I'll port it to GPU and start working with the next comparators, such as float, then strings, in a incremental way. 2011/9/19 Thom Brown t...@linux.com: Found it! http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/ngm/15-823/project/Final.pdf This is a really great work, and I'm basing mine on it. But it's implemented using OpenGL (yes, not OpenCL), and therefore has a lot of limitations. I also tried to contact naju but didn't get any answer. Vítor Uwe Reus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers