My 2 cents. I reckon this BLOB feature would be useful to have as part of the spec long-term. In healthcare integration, DICOM to FHIR integration is a use case which I came across. I don't know if it's possible to capture the DICOM format in DFDL (it seems doable at first glance), but suppose it is, then I can easily imagine a situation where the integrator wants to lift the metadata from the DICOM file to create resources on a FHIR server and dump the pixel data somewhere else.
Claude On Fri, Jan 10, 2025 at 4:50 PM Mike Beckerle <mbecke...@apache.org> wrote: > Very helpful thanks Mark. I can cite this thread of emails as support for > the BLOB feature to be added to DFDL v2.0. > > On Fri, Jan 10, 2025 at 10:46 AM Mark Kozak <mark.ko...@adeptus-cs.com> > wrote: > >> Yes, I agree, but… >> >> When I say ‘other image processing software’, I am not talking about >> photoshop or other ‘standard’ commercial applications that require a well >> formed image file like a JFIF. I have files that use various compression >> algorithms such as JPEG2000 for example. I can write that compressed pixel >> blob and use a JPEG 2000 library to decompress it to work with the actual >> pixel values outside the DFDL dataflow. Once decompressed, the processing >> could be python scripts or any other pixel processing software. It’s also >> helpful in examining blob payloads that are supposed to be image data but >> are not behaving as one might expect. I can use a variable to turn on image >> debug mode to get the image blob to a file for examination. >> >> >> >> So, yes, there are times I do both a and b. >> >> I hope that helps. >> >> >> >> -Mark >> >> >> >> *From:* Mike Beckerle <mbecke...@apache.org> >> *Sent:* Friday, January 10, 2025 10:23 AM >> *To:* users@daffodil.apache.org >> *Subject:* Re: BLOB feature - is it being used? >> >> >> >> Thanks Mark, >> >> I have a question, or rather I really don't understand parsing image data >> but also using BLOBs to process image content. >> >> This is from the Wiki page describing the BLOB feature: >> >> *A variety of data formats such as for image and video files, consist of >> fields of what is effectively metadata, surrounding large blocks of data >> containing compressed image or video data.* >> >> >> >> *An important use case for DFDL is to expose this metadata for easy use, >> and to provide access to the large data via a streaming mechanism akin to >> opening a file, rather than including large chunks of a hexBinary string in >> the infoset, as is common today.* >> >> The above suggests BLOBs can be used to keep a giant array of pixel bytes >> out of memory. So far so good. >> >> >> >> But if you are trying to both >> >> >> >> (a) expose and inspect/sanitize the image metadata, and also >> >> (b) process the image (e.g., to remove steganography), >> >> >> >> then I don't see how this works. Standard image processing libraries are >> going to want the entire image "file", not just the pixel data bytes from >> somewhere down inside that file. That implies that the BLOB isn't just the >> blob of pixel data, but rather the BLOB must be the entire image "file" >> extracted from within the surrounding data envelope. But that implies that >> you are not using a DFDL schema to parse the image field by field so as to >> inspect/sanitize the metadata fields. >> >> >> >> In other words, DFDL+BLOB extension will let you do (a) or (b) but not >> both. >> >> >> >> Do I have this right, or am I misunderstanding the use case? >> >> >> >> Thanks for any info >> >> >> >> -mike beckerle >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com> >> wrote: >> >> I have occasionally used this feature to get pixel data (other than NITF) >> out to a file so that it can be processed using other image processing >> software. >> >> -----Original Message----- >> From: Mike Beckerle <mbecke...@apache.org> >> Sent: Thursday, January 9, 2025 2:14 PM >> To: users@daffodil.apache.org >> Subject: BLOB feature - is it being used? >> >> Is anyone using the experimental BLOB feature in Daffodil. >> (https://s.apache.org/daffodil-blob-feature) >> >> If so, please reply, or you can email me directly. >> >> This BLOB feature was added as we thought it would be used for the pixels >> of >> images. >> >> I've not seen any questions about it or discussion since it got >> implemented. >> >> I do know that it is used in the NITF DFDL schema on github, but the test >> data for that schema does *not* use that element at all, so nothing that >> is >> part of that schema exercises the feature. >> >> I ask because this extension to DFDL v1.0, if used, would be a strong >> candidate for inclusion in the next version of the DFDL specification >> (from >> OGF and ISO). >> But.... if nobody is using the BLOB feature, that means other techniques >> are >> sufficient, and then there will be push back within the DFDL Workgroup >> against adding this feature to DFDL as part of the standard. >> >> Personally, I have used this idea for "blobs" of data: >> >> <element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ... >> the big blob length ...}'> >> <complexType> >> <sequence> >> <element name="blob" dfdl:lengthKind="implicit"> >> <!-- this blob element allows a dfdl:outputValueCalc='{ >> dfdl:contentLength(..../pixels/blob) }' to work to capture the length when >> unparsing --> >> <complexType> >> <sequence> >> <!-- Avoid giant lines. This is XML. Users *may* want to >> open it in a text editor. >> Note max size of blob is 100000100. >> --> >> <element name="a" type="xs:hexBinary" minOccurs="0" >> maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100" >> dfdl:occursCountKind="implicit"/> >> <element name="last" type="xs:hexBinary" >> minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/> >> </sequence> >> </complexType> >> </element> >> </sequence> >> </complexType> >> </element> >> >> That combined with the use of EXI to avoid the XML text bloat seems like >> it >> would address most needs. >> >>