David, I'm CC'ing the list in case it helps others or others have insight.
I'm accessing the Xdmf2 and H5 libraries through C++. The XML descriptions generated by our translator are sometimes several megabytes. They are large because there are sometimes many hundreds of time steps and a dozen or so fields, and the data is replicated in several grids in the various coordinate spaces that our collaborators want to see. I did some profiling of the Xdmf2 library and the problem seems to be that when the XML text is being generated, the buffer reallocation strategy is to create a new buffer with just enough room to hold the newly serialized XML, and the text in the previous buffer is copied into the new buffer. In other words, there is a lot of excess copying going on. I modified my local copy of the Xdmf2 library to change the memory reallocation strategy to double the size of the buffer any time a reallocation is needed, and that sped up the serialization tremendously. Building the Xdmf tree still took a long time, but I haven't looked into why that is slow. I may be wrong, but Xdmf2 doesn't seem to be under active development and there doesn't seem to be much community support of it, which is a shame because it is a nice format. Cory On Wed, Aug 15, 2012 at 2:50 PM, David Zemon <[email protected]> wrote: > Cory, > > What takes so long in the XML description? For me, all of my time goes into > translating the data and mostly file I/O. Attached is an example of one Xdmf > file I've been working with. > > Also, what language are you using? > > Cheers, > David > > > On 08/15/2012 01:36 PM, Cory Quammen wrote: >> >> David, >> >> I'm currently working on a translator that sounds very similar to >> yours. It uses HDF5 for the heavy data and that part works fine. >> >> For what it's worth, my experience is that the Xdmf library is >> painfully slow when serializing a large tree. For some of the data >> sets that I work with, writing the XML description of the data takes >> far longer than writing the HDF5 files. >> >> Cory >> >> On Wed, Aug 15, 2012 at 2:29 PM, David Zemon <[email protected]> wrote: >>> >>> Cory, >>> >>> Unfortunately no, it isn't. Reading CSV is just a small stepping-stone in >>> the overall goal of this project. I'm trying to make a reader that will >>> convert any text-delimited file of any size (we have professors on campus >>> with Terabytes of data - necessitating that it run separately from >>> ParaView). I also plan to give the user options like creating a >>> difference >>> field between a column in one file and another column in a different >>> file. >>> >>> David >>> >>> >>> On 08/15/2012 01:19 PM, Cory Quammen wrote: >>>> >>>> David, >>>> >>>> Just curious, is ParaView's CSV reader not sufficient for reading your >>>> files? >>>> >>>> Cory >>>> >>>> On Wed, Aug 15, 2012 at 1:58 PM, David Zemon <[email protected]> >>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I'm creating a reader to convert large dataset from CSV to a ParaView >>>>> readable format. XDMF was chosen because it seems like a simple-to-use >>>>> and >>>>> understand format. This worked well while I was testing small datasets >>>>> but >>>>> when I scaled up to larger data, I ran across a problem where the XML >>>>> node >>>>> was too large (could not have 350,000 rows). >>>>> >>>>> I want to make sure now that I am on the right track. I've decided to >>>>> start >>>>> researching the HDF5 format and will place all of my data into an HDF5 >>>>> file >>>>> and then include that in the XDMF file. Does this seem reasonable? Is >>>>> there >>>>> a better way to do it? >>>>> >>>>> Thank you, >>>>> David Zemon >>>>> _______________________________________________ >>>>> Powered by www.kitware.com >>>>> >>>>> Visit other Kitware open-source projects at >>>>> http://www.kitware.com/opensource/opensource.html >>>>> >>>>> Please keep messages on-topic and check the ParaView Wiki at: >>>>> http://paraview.org/Wiki/ParaView >>>>> >>>>> Follow this link to subscribe/unsubscribe: >>>>> http://www.paraview.org/mailman/listinfo/paraview >>>> >>>> >>>> >> >> > -- Cory Quammen Research Associate Department of Computer Science The University of North Carolina at Chapel Hill _______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview
