Yes, I agree, but… When I say ‘other image processing software’, I am not talking about photoshop or other ‘standard’ commercial applications that require a well formed image file like a JFIF. I have files that use various compression algorithms such as JPEG2000 for example. I can write that compressed pixel blob and use a JPEG 2000 library to decompress it to work with the actual pixel values outside the DFDL dataflow. Once decompressed, the processing could be python scripts or any other pixel processing software. It’s also helpful in examining blob payloads that are supposed to be image data but are not behaving as one might expect. I can use a variable to turn on image debug mode to get the image blob to a file for examination.
So, yes, there are times I do both a and b. I hope that helps. -Mark From: Mike Beckerle <mbecke...@apache.org> Sent: Friday, January 10, 2025 10:23 AM To: users@daffodil.apache.org Subject: Re: BLOB feature - is it being used? Thanks Mark, I have a question, or rather I really don't understand parsing image data but also using BLOBs to process image content. This is from the Wiki page describing the BLOB feature: A variety of data formats such as for image and video files, consist of fields of what is effectively metadata, surrounding large blocks of data containing compressed image or video data. An important use case for DFDL is to expose this metadata for easy use, and to provide access to the large data via a streaming mechanism akin to opening a file, rather than including large chunks of a hexBinary string in the infoset, as is common today. The above suggests BLOBs can be used to keep a giant array of pixel bytes out of memory. So far so good. But if you are trying to both (a) expose and inspect/sanitize the image metadata, and also (b) process the image (e.g., to remove steganography), then I don't see how this works. Standard image processing libraries are going to want the entire image "file", not just the pixel data bytes from somewhere down inside that file. That implies that the BLOB isn't just the blob of pixel data, but rather the BLOB must be the entire image "file" extracted from within the surrounding data envelope. But that implies that you are not using a DFDL schema to parse the image field by field so as to inspect/sanitize the metadata fields. In other words, DFDL+BLOB extension will let you do (a) or (b) but not both. Do I have this right, or am I misunderstanding the use case? Thanks for any info -mike beckerle On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com <mailto:mark.ko...@adeptus-cs.com> > wrote: I have occasionally used this feature to get pixel data (other than NITF) out to a file so that it can be processed using other image processing software. -----Original Message----- From: Mike Beckerle <mbecke...@apache.org <mailto:mbecke...@apache.org> > Sent: Thursday, January 9, 2025 2:14 PM To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> Subject: BLOB feature - is it being used? Is anyone using the experimental BLOB feature in Daffodil. (https://s.apache.org/daffodil-blob-feature) If so, please reply, or you can email me directly. This BLOB feature was added as we thought it would be used for the pixels of images. I've not seen any questions about it or discussion since it got implemented. I do know that it is used in the NITF DFDL schema on github, but the test data for that schema does *not* use that element at all, so nothing that is part of that schema exercises the feature. I ask because this extension to DFDL v1.0, if used, would be a strong candidate for inclusion in the next version of the DFDL specification (from OGF and ISO). But.... if nobody is using the BLOB feature, that means other techniques are sufficient, and then there will be push back within the DFDL Workgroup against adding this feature to DFDL as part of the standard. Personally, I have used this idea for "blobs" of data: <element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ... the big blob length ...}'> <complexType> <sequence> <element name="blob" dfdl:lengthKind="implicit"> <!-- this blob element allows a dfdl:outputValueCalc='{ dfdl:contentLength(..../pixels/blob) }' to work to capture the length when unparsing --> <complexType> <sequence> <!-- Avoid giant lines. This is XML. Users *may* want to open it in a text editor. Note max size of blob is 100000100. --> <element name="a" type="xs:hexBinary" minOccurs="0" maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100" dfdl:occursCountKind="implicit"/> <element name="last" type="xs:hexBinary" minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/> </sequence> </complexType> </element> </sequence> </complexType> </element> That combined with the use of EXI to avoid the XML text bloat seems like it would address most needs.
smime.p7s
Description: S/MIME cryptographic signature