Hello, We are writing a GPU-based libjpeg-turbo accelerated version and our goal is to use it in Chrome or other browsers that using Webkit. Now we have written a beta version, that can use GPU to decode JPEG files in Chromium. We still have a lot of work to do. And I have known from the Chromium community that there is also an effort underway in WebKit to generalize the concept of parallel/asynchronous image decoding. So I wan't to know whether we could contribute code?
Thanks a lot. Peixuan Zhang 20121123 The following is the email that I have sent to Chromium community. =================================== Hello, I'm a programmer, and my team and I are writing a GPU-based libjpeg-turbo accelerated version, and we mainly use OpenCL. Our goal is to use it in Chrome. Now we have written a beta version, that can use GPU to decode JPEG files in Chromium. However, because we need to load the some additional .dll files and API (e.g. we must load OpenCL.dll), this version must run with the parameter --no-sandbox. We don't know how to run it without no-sandbox, so I really want to know how to load additional .dll files and access some information of the registry in sandbox. Is there some way to do it? In addition, because we need to do some initialization before using OpenCL, while Chrome is a multi-process application, so it needs to do initialization work in each process, which increases the time consumption. We have put forward several ideas, and my workmate Peng Xiao has discussed with you in this community. But after some discussion, we thought that these ideas may not be suitable. Therefore, we have proposed another solution, if we use a separate process to deal with jpeg decoder, We won't need to do multiple initialization work. I think it just like the process of "--type=gpu-process". We could decode image using a single process. We learned that Chrome run JPEG decoder in sandbox maybe because safety factors, so we don't know if we run all JPEG decoder in one process, whether it will bring security risks? Or whether it will bring other problems? Because this step of the work is still in the conceptual stage, we do not know whether it is worthwhile to go ahead. Yours sincerely. ============================================================= 1. Do you have timing information about how jpeg decoding is a bottleneck at the moment? How much % of time is spent in jpeg decoding on rendering? According to the libjpeg-turbo-OpenCL that we have already completed, the performance is a little good than the original version. Of course, we only tested independent libjpeg-turbo, and there may be some differences in Chrome. We tested on AMD A10M 4600M 2.3GHz, on this platform, the OpenCL version is 20~70% faster than before (the performance due to image size and sampling ratio). And for some case, it's even 8% faster than Intel i7-3520M 2.9GHz. Of course, in many cases the JPEG codec is not the most time-consuming things in browsers, but with the popularity of HTML5, the picture codec's proportion will be more and more. e.g. There are many JPEG textures in WebGL. 2. Do you plan to use OpenCL for other things than jpeg decoding? Yes, we do have more plan that use OpenCL to accelerate some of the features in Chrome, what we're doing at least include JPEG and FFMpeg, in the future we may do more work on image and video. 3. Do you have an idea about the latency introduced by doing that, plus the kernel overhead, compared to a completely user-mode solution? There are several context switches introduced that would add a constant time to decoding an image, which severely affect smaller images. Is it worth sending 500 bytes of data to the GPU to be decoded? I don't think so. Yes, we have some ideas that can reduce the transmission time between CPU and GPU, and we also try to reduce the time of kernel overhead, some of these ideas have matured, but we are waiting for its open source. 4. The sandbox bypass is a non-starter. Adding yet-another-process is a non-starter too. Having a new jpeg decoder significantly increases the attack surface so just from a security perspective, I'm not sure it's worth. It's very important, if it would bring high risk of safe, the value of the work is low. 5. Do you have an idea how to do the runtime trade off when it's worth doing a software-only decoding versus offloading to the GPU? What if the user has its GPU already saturated but its CPU idle? At the extreme end, let's assume a dual-8-Cores-Xeon with a low-end Intel integrated graphic cards with 2 30" monitors plugged in. OK, We are concerned about the different things, I think on AMD trinity APU, there may not be such problems, for Intel, I think I need to do some additional research. In addition to what Marc-Antoine said, note that there is also an effort underway in WebKit to generalize the concept of parallel/asynchronous image decoding. You probably want to sync up with that effort to see what overlaps. Thanks a lot, I will send email to WebKit community for more infomation.
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev