On your second run using the proper mip levels you are still reading more off disk then the sum of your textures, so playing with your texture cache size should still help you out. You may not need to make it >= your total textures but I'd suggest adjusting it to see if you can lower your i/o number to closer to your total texture size.
It looks like you are using 25 threads? We found that the default bin size size of the TileCache bin is 32, and if your threads spend a majority of your time reading texture data 32 is not enough and the threads end up blocking each other, we typically use 128 bins and have found this improved performance in our application and that has been successful for up to 60 concurrent threads. From: Oiio-dev [mailto:[email protected]] On Behalf Of Stastny, Bret Sent: Thursday, September 11, 2014 2:20 PM To: OpenImageIO developers Subject: Re: [Oiio-dev] Best Practises for using OII ..... This is your problem: Total size of all images referenced : 1.7 GB Read from disk : 10.3 GB You basically read the total sum of your texture size 5X during your run. If you can set your texture cache to >= 1.7GB you will see a big speed up. Making it any larger than the default 250MB is going to help you. Depending on your texture working set tweaking this can make a big difference and you should keep an eye on it as you continue to develop or add more concurrent textures that need to be accessed. Proper mip mapping will also help reduce your memory needs for textures. We typically allocate up to half the system memory to our texture cache size, because we have the memory to spare. We routinely run with tiled and mipped exr files without any problem, we use our own compression on them. -bret From: Oiio-dev [mailto:[email protected]] On Behalf Of Simon Smith Sent: Thursday, September 11, 2014 12:36 PM To: OpenImageIO developers Subject: Re: [Oiio-dev] Best Practises for using OII ..... Thanks for the information - once I got the stats out on shutdown it quickly came apparent that it was not utilising *any* of the mipmaps because I was using 0 for all the derivatives. I'd like to clarify what these should be though. OK, so I'm am essentially using OIIO as a texture handler for compositing images onto a lat/long environment map. These tests were using tx files as a starter but we'd want to support EXR's as well (see later). There are no extra attributes currently set on the texture manager. When I calculate the derivatives I'm using a value of the original source image size divided by the final rendered image size (this probably slightly oversampled as the source image will not cover the final image completely, but it's a good approximation). For the s derivatives I use this delta for the dsdx, and 0 for dsdy, and for the t derivatives I'm using a similarly calculated value for dtdy, and 0 for dtdx. I'm using 0 because I don't think there is a change in that axis across the sample, but I'm not sure that is correct - could someone please clarify how these values should be correctly calculated. So, here were the original stats before I used the above values derivatives showing the very bad performance: OpenImageIO Texture statistics Queries/batches : texture : 957238 queries in 957238 batches texture 3d : 0 queries in 0 batches shadow : 0 queries in 0 batches environment : 0 queries in 0 batches Interpolations : closest : 867238 bilinear : 0 bicubic : 40000 Average anisotropic probes : 1 Max anisotropy in the wild : 1 OpenImageIO ImageCache statistics (shared) ver 1.4.8 Images : 7 unique ImageInputs : 6 created, 6 current, 6 peak This File I/O time : 4m 6.0s (9.8s average per thread) File open time only : 0.0s Tiles: 168047 created, 4072 current, 4354 peak total tile requests : 968483 micro-cache misses : 407969 (42.1245%) main cache misses : 168047 (17.3516%) Peak cache memory : 254.5 MB Image file statistics: opens tiles MB read I/O time res File BROKEN 2 1 3249 203.1 2.9s 3635x3635x4.f32 Flourescent Reflector.tx MIP-UNUSED MIP-COUNT [3249,0,0,0,0,0,0,0,0,0,0,0] 3 1 41968 2623.0 1m 11.9s 3444x3444x4.f32 Halogen Desk Lamp.tx MIP-UNUSED MIP-COUNT [41968,0,0,0,0,0,0,0,0,0,0,0] 4 1 24078 1504.9 21.6s 2520x2520x4.f32 Halogen Downlight.tx MIP-UNUSED MIP-COUNT [24078,0,0,0,0,0,0,0,0,0,0,0] 5 1 64617 4038.6 1m 41.1s 4698x4698x4.f32 Large Rect Softbox.tx MIP-UNUSED MIP-COUNT [64617,0,0,0,0,0,0,0,0,0,0,0,0] 6 1 4096 256.0 9.6s 4732x4732x4.f32 Small Rect Soft Box.tx MIP-UNUSED MIP-COUNT [4096,0,0,0,0,0,0,0,0,0,0,0,0] 7 1 30039 1877.4 38.9s 3360x3360x4.f32 Warm CFL Reflector AWB.tx MIP-UNUSED MIP-COUNT [30039,0,0,0,0,0,0,0,0,0,0,0] Tot: 6 168047 10502.9 4m 6.0s Using OIIOTool, here is a dump of the tx file for the Large Rect Softbox.tx file. Reading Large Rect Softbox.tx Large Rect Softbox.tx : 4698 x 4698, 4 channel, float tiff MIP-map levels: 4698x4698 2349x2349 1174x1174 587x587 293x293 146x146 73x73 36x36 18x18 9x9 4x4 2x2 1x1 channel list: R, G, B, A tile size: 64 x 64 oiio:BitsPerSample: 32 ImageDescription: "SHA-1=33E8B1A761856041BC6526980C68944EF8BF3A60" Orientation: 1 (normal) Software: "OpenImageIO 1.4.8 : maketx "Large Rect Softbox.exr"" Copyright: "Lightmap Ltd" DateTime: "2014:09:10 9:29:55" textureformat: "Plain Texture" wrapmodes: "black,black" fovcot: 1 tiff:PhotometricInterpretation: 2 tiff:PlanarConfiguration: 1 planarconfig: "contig" tiff:Compression: 8 compression: "zip" IPTC:OriginatingProgram: "OpenImageIO 1.4.8 : maketx "Large Rect Softbox.exr"" IPTC:CopyrightNotice: "Lightmap Ltd" IPTC:Caption: "SHA-1=33E8B1A761856041BC6526980C68944EF8BF3A60" Once I changed the derivatives, the difference was dramatic, and seems to be traced down to the mipmap usage (which makes complete sense really). I should note that I found another instance of using the incorrect derivatives after taking this stat grab below which resulted in the largest mipmaps never being used (they were pushed down to the 2nd and 3rd mipmaps), the peak memory usage dropping to less than 200Mb, and main cache misses down to 0.001% - all improving performance of course. OpenImageIO Texture statistics Queries/batches : texture : 5559321 queries in 5559321 batches texture 3d : 0 queries in 0 batches shadow : 0 queries in 0 batches environment : 0 queries in 0 batches Interpolations : closest : 32395926 bilinear : 0 bicubic : 110000 Average anisotropic probes : 2.98 Max anisotropy in the wild : 2 OpenImageIO ImageCache statistics (shared) ver 1.4.8 Images : 7 unique ImageInputs : 6 created, 6 current, 6 peak Total size of all images referenced : 1.7 GB Read from disk : 2.8 GB File I/O time : 34.9s (1.4s average per thread) File open time only : 0.0s Tiles: 46231 created, 4096 current, 4108 peak total tile requests : 32677961 micro-cache misses : 2525392 (7.72812%) main cache misses : 46231 (0.141475%) Peak cache memory : 256.0 MB Image file statistics: opens tiles MB read I/O time res File BROKEN 2 1 4665 291.6 2.5s 3635x3635x4.f32 Flourescent Reflector.tx MIP-COUNT [3249,0,1099,317,0,0,0,0,0,0,0,0] 3 1 7227 451.7 5.4s 3444x3444x4.f32 Halogen Desk Lamp.tx MIP-COUNT [5834,0,1109,284,0,0,0,0,0,0,0,0] 4 1 7858 491.1 7.6s 2520x2520x4.f32 Halogen Downlight.tx MIP-COUNT [4800,2445,613,0,0,0,0,0,0,0,0,0] 5 1 19256 1203.5 12.5s 4698x4698x4.f32 Large Rect Softbox.tx MIP-COUNT [15987,0,2576,693,0,0,0,0,0,0,0,0,0] 6 1 461 28.8 1.1s 4732x4732x4.f32 Small Rect Soft Box.tx MIP-COUNT [0,0,361,100,0,0,0,0,0,0,0,0,0] 7 1 6763 422.7 5.8s 3360x3360x4.f32 Warm CFL Reflector AWB.tx MIP-COUNT [5408,0,1084,271,0,0,0,0,0,0,0,0] Tot: 6 46230 2889.4 34.9s In terms of how we access the textures, we process single output pixels at a time and pull in texture values from as many images as are overlaying on that point. So whilst there is some continuity in the accessing the pixels in any one texture (by the nature of the fact that I am just compositing the images onto a destination through a re-projection) this process runs over multiple threads so can potentially be reading out of different areas of the same texture on different threads, but I guess this is where tiling come into it though! When we allow for loading EXR files (or an other non-tx files) I'm thinking we should make sure that we have the mipmap and tile attributes set so that they are generated for us by the texture system. Would that be correct? If so, when is the tiling and mipmapping done - is it on loading the image, or does it do it just as and when needed in some special way? I'm just trying to get a handle on when the hit for access these non-tx'd files will come in. Thanks again for your help guys - I really do appreciate it :) Best Regards, Simon ________________________ Simon C. Smith Co-founder & CTO Lightmap Ltd Creators of HDR Light Studio software Web site: www.hdrlightstudio.com<http://www.hdrlightstudio.com/> ________________________ Registered in England and Wales 06879016 International House, Brunel Drive, Newark, NG24 2EG. UK On 11 Sep 2014, at 01:38, Larry Gritz <[email protected]<mailto:[email protected]>> wrote: To answer your question directly: no, you should not need to access the files in any particular order. It should be able to handle thousands of texture files totalling many hundreds of GB without trouble. When you post the other info I suggested, I think we'll have more specific advice to offer. Two more questions: 1. What is your access pattern like? Is it *completely* incoherent, like every single query is at a completely random location and file compared to the previous one? Or is there any coherence at all? 2. Are you supplying correct derivatives to your texture calls (dsdx, dtdx, dsty, dtdy)? On Sep 10, 2014, at 2:59 PM, Simon Smith <[email protected]<mailto:[email protected]>> wrote: Hi guys, OK, I'm looking for some Best Practices and Do's/Dont's when using the OIIO system to manage textures and give simple access to the texture pixel values as and when needed. The problem is that we're currently not getting the performance I might have expected once we start to use only a few input images. We are essentially using multiple 4k EXR's and sampling them pretty much across their entirety during the process, but not linearly (as in, we might sample from multiple textures for each final pixel calculation). What I'm finding is that once we start using only 3 or 4 of these with the high level texture system, the performance drops off dramatically. I did test converting them to tx files (thinking that it would use just the lower mipmaps thus give better performance) but that did not seem to make any difference, so obviously we are doing something very wrong! Before I went diving into the codebase to see what is happening I thought I'd just run a quick email by you guys who know how best to use the systems. Should we be trying to only utilise one texture at a time perhaps, or should we be setting some of the texture system attributes to better utilise the system. When requesting pixel data from textures are there things we should be thinking about tailored to our situation so that the system runs smoothly. Should we actually not be using the higher level texture system but rather getting more down-and-dirty with the lower image level functionality. Apologies for the raft of questions, but i'm 90% sure we are having the issues we are because we are not using the system properly, so hopefully you can share some pearly words of wisdom with me :) Best Regards, Simon. _______________________________________________ Oiio-dev mailing list [email protected]<mailto:[email protected]> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org -- Larry Gritz [email protected]<mailto:[email protected]> _______________________________________________ Oiio-dev mailing list [email protected]<mailto:[email protected]> http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
_______________________________________________ Oiio-dev mailing list [email protected] http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org
