The slowest part of large images scaling in production is their retrieval from Swift, which could be much faster for bucketed images.
On Thu, May 1, 2014 at 7:57 AM, Gilles Dubuc <gil...@wikimedia.org> wrote: > An extremely crude benchmark on our multimedia labs instance, still using > the same test image: > > original -> 3002 (original size) 0m0.268s > original -> 2048 0m1.344s > original -> 1024 0m0.856s > original -> 512 0m0.740s > original -> 256 0m0.660s > 2048 -> 1024 0m0.444s > 2048 -> 512 0m0.332s > 2048 -> 256 0m0.284s > 1024 -> 512 0m0.112s > 512 -> 256 0m0.040s > > Which confirms that chaining instead of generating all thumbnails based on > the biggest bucket saves a significant amount of processing time. It's > definitely in the same order or magnitude as the savings achieved by going > from original as the source to the biggest bucket as the source. > > It's also worth noting that generating the thumbnail of the same size as > the original is relatively cheap. Using it as the source for the 2048 image > doesn't save that much time, though: 0m1.252s (for 3002 -> 2048). > > And here's a side-by-side comparison of these images generated with > chaining and images that come from our regular image scalers: > https://dl.dropboxusercontent.com/u/109867/imagickchaining/index.html Try > to guess which is which before inspecting the page for the answer :) > > > On Thu, May 1, 2014 at 4:02 PM, Gilles Dubuc <gil...@wikimedia.org> wrote: > > > Another point about picking the "one true bucket list": currently Media > > Viewer's buckets have been picked based on the most common screen > > resolutions, because Media Viewer tries to always use the entire width of > > the screen to display the image, so trying to achieve a 1-to-1 pixel > > correspondence makes sense, because it should give the sharpest result > > possible to the average user. > > > > However, sticking to that approach will likely introduce a cost. As I've > > just mentioned, we will probably need to generate more than one of the > high > > buckets based on the original, in order to avoid resizing artifacts. > > > > On the other hand, we could decide that the unified bucket list shouldn't > > be based on screen resolutions (after all the full width display scenario > > experienced in Media Viewer might be the exception, and the buckets will > be > > for everything mediawiki) and instead would progress by powers of 2. Then > > creating a given bucket could always be done without resizing artifacts, > > based on the bucket above the current one. This should provide the > biggest > > savings possible in image scaling time to generate thumbnail buckets. > > > > To illustrate with an example, the bucket list could be: 256, 512, 1024, > > 2048, 4096. The 4096 bucket would be generated first, based on the > > original, then 2048 would be generated based on 4096, then 1024 based on > > 2048, etc. > > > > The big downside is that there's less progression in the 1000-3000 range > > (4 buckets in the Media Viewer resolution-oriented strategy, 2 buckets > > here) where the majority of devices currently are. If I take a test image > > as an example ( > > https://commons.wikimedia.org/wiki/File:Swallow_flying_drinking.jpg), > the > > file size progression is quite different between the screen resolution > > buckets and the geometric (powers of 2) buckets: > > > > - screen resolution buckets > > 320 11.7kb > > 640 17kb > > 800 37.9kb > > 1024 58kb > > 1280 89.5kb > > 1920 218.9kb > > 2560 324.6kb > > 2880 421.5kb > > > > - geometric buckets > > 256 9.4kb > > 512 20kb > > 1024 58kb > > 2048 253.1kb > > 4096 test image is smaller than 4096 > > > > It seems like it's not ideal that a screen resolution slightly above 1024 > > would suddenly need to download an image 5 times as heavy, for not that > > many extra pixels on the actual screen. A similar thing can be said for > the > > screen resolution progression, where the file size more than doubles > > between 1280 and 1920. We could probably use at least an extra step > between > > those two if we use screen resolution buckets, like 1366 and/or 1440. > > > > I think that the issue of buckets between 1000 and 3000 is tricky, it's > > going to be difficult to avoid generating them based on the original > while > > not getting visual artifacts. > > > > Maybe we can get away with generating 1280 (and possibly 1366, 1440) > based > > on 2048, the distance between the two guaranteeing that the quality > issues > > will be negligible. We definitely can't generate a 1920 based on a 2048 > > thumbnail, though, otherwise artifacts on thin lines will look awful. > > > > A mixed progression like this might be the best of both worlds, if we > > confirm that between 1024 and 2048 the resizing is artifact-free enough: > > > > 256, 512, 1024, 1280, 1366, 1440, 2048, 4096 where 2048 is generated > based > > on 4096, 1024, 1280, 1366, 1440 are generated based on 2048, 512 based on > > 1024, 256 based on 512. > > > > If for example the image width is between 1440 and 2048, then 1024, 1280, > > 1366, 1440 would be generated based on the original. That's fine > > performance-wise, since the original is small. > > > > Something that might also be useful to generate is a thumbnail of the > same > > size as the original if original < 4096 (or whatever the highest bucket > > is). Currently we seem to block generating such a thumbnail, but the > > difference in file size is huge. For the test image mentioned above, > which > > is 3002 pixels wide, the original is 3.69MB, while a thumbnail of the > same > > size would be 465kb. For the benefit of retina displays that are > 2560/2880, > > displaying a thumbnail of the same size as a 3002 original would > definitely > > be better than the highest available bucket (2048). > > > > All of this is benchmark-worthy anyway, I might be splitting hair looking > > for powers of two if rendering a bucket chain (each bucket generated > based > > on the next one) isn't that much faster than generating all buckets based > > on the biggest bucket. > > > > > > On Thu, May 1, 2014 at 12:54 PM, Gilles Dubuc <gil...@wikimedia.org > >wrote: > > > >> _don't consider the upload complete_ until those are done! a web > uploader > >>> or API-using bot should probably wait until it's done before uploading > the > >>> next file, for instance... > >>> > >> > >> You got me a little confused at that point, are you talking about the > >> client generating the intermediary sizes, or the server? > >> > >> I think client-side thumbnail generation is risky when things start > >> getting corrupt. A client-side bug could result in a user uploading > >> thumbnails that are for a different image. And if you want to run a > visual > >> signature check on the server-side to avoid that issue, you might be > >> looking at similar processing time checking that the thumbnail is for > the > >> correct image then if the server was to generate the actual thumbnail. > It > >> would be worth researching if there's a very fast "is this thumbnail a > >> smaller version of that image" algorithm out there. We don't need 100% > >> confidence either, if we're looking to avoid shuffling bugs in a given > >> upload batch. > >> > >> Regarding the issue of a single intermediary size versus multiple, > >> there's still a near-future plan to have pregenerated buckets for Media > >> Viewer (which can be reused for a whole host of other things). Those > could > >> be used like mip-maps like you describe. Since these sizes will be > >> generated at upload time, why not use them? > >> > >> However quality starts to introduce noticeable visual artifacts when the > >> bucket (source image)'s dimensions are too close to the thumbnail you > want > >> to render. > >> > >> Consider the existing Media Viewer width buckets: 320, 640, 800, 1024, > >> 1280, 1920, 2560, 2880 > >> > >> I think that generating the 300px thumbnail based on the 320 bucket is > >> likely to introduce very visible artifacts with thin lines, etc. > compared > >> to using the biggest bucket (2880px). Maybe there's a smart compromise, > >> like picking the higher bucket (eg. 300px thumbnail would use the 640 > >> bucket as its source, etc.). I think that we need a battery of visual > test > >> to determine what's the best strategy here. > >> > >> All of this is dependent on Ops giving the green light for pregenerating > >> the buckets, though. The swift capacity for it is slowing being brought > >> online, but I think Ops' prerequisite wish to saying yes to it is that > we > >> focus on the post-swift strategy for thumbnails. We also need to figure > out > >> the performance impact of generating all these thumbnails on upload. On > a > >> very meta note, we might generate the smaller buckets based on the > biggest > >> bucket and only the 2-3 biggest buckets based on the original (still to > >> avoid visual artifacts). > >> > >> Another related angle I'd like to explore is to submit a simplified > >> version of this RFC: > >> > https://www.mediawiki.org/wiki/Requests_for_comment/Standardized_thumbnails_sizeswherewe'd > propose a single bucket list option instead of multiple > >> (presumably, the Media Viewer ones, if not, we'd update Media Viewer to > use > >> the new canon list of buckets). And where we would still allow arbitrary > >> thumbnail sizes below a certain limit. For example, people would still > be > >> allowed to request thumbnails that are smaller than 800px at any size > they > >> want, because these are likely to be thumbnails in the real sense of the > >> term, and for anything above 800px they would be limited to the > available > >> buckets (eg. 1024, 1280, 1920, 2560, 2880). This would still allow > >> foundation-hosted wikis to have flexible layout strategies with their > >> thumbnail sizes, while reducing the craziness of this attack vector on > the > >> image Scalers and gigantic waste of disk and memory space on the > thumbnail > >> hosting. I think it would be an easier sell for the community, the > current > >> RFC is too extreme in banning all arbitrary sizes and offers too many > >> bucketing options. I feel like the standardization of true thumbnail > sizes > >> (small images, <800px) is much more subject to endless debate with no > >> consensus. > >> > >> > >> On Thu, May 1, 2014 at 12:21 PM, Erwin Dokter <er...@darcoury.nl> > wrote: > >> > >>> On 04/30/2014 12:51 PM, Brion Vibber wrote: > >>> > >>>> * at upload time, perform a series of scales to produce the mipmap > >>>> levels > >>>> * _don't consider the upload complete_ until those are done! a web > >>>> uploader > >>>> or API-using bot should probably wait until it's done before uploading > >>>> the > >>>> next file, for instance... > >>>> * once upload is complete, keep on making user-facing thumbnails as > >>>> before... but make them from the smaller mipmap levels instead of the > >>>> full-scale original > >>>> > >>> > >>> Would it not suffice to just produce *one* scaled down version (ie. > >>> 2048px) which the real-time scaler can use to produce the thumbs? > >>> > >>> Regards, > >>> -- > >>> Erwin Dokter > >>> > >>> > >>> > >>> _______________________________________________ > >>> Wikitech-l mailing list > >>> Wikitech-l@lists.wikimedia.org > >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > >>> > >> > >> > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- Best regards, Max Semenik ([[User:MaxSem]]) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l