Very helpful thanks Mark. I can cite this thread of emails as support for the BLOB feature to be added to DFDL v2.0.
On Fri, Jan 10, 2025 at 10:46 AM Mark Kozak <mark.ko...@adeptus-cs.com> wrote: > Yes, I agree, but… > > When I say ‘other image processing software’, I am not talking about > photoshop or other ‘standard’ commercial applications that require a well > formed image file like a JFIF. I have files that use various compression > algorithms such as JPEG2000 for example. I can write that compressed pixel > blob and use a JPEG 2000 library to decompress it to work with the actual > pixel values outside the DFDL dataflow. Once decompressed, the processing > could be python scripts or any other pixel processing software. It’s also > helpful in examining blob payloads that are supposed to be image data but > are not behaving as one might expect. I can use a variable to turn on image > debug mode to get the image blob to a file for examination. > > > > So, yes, there are times I do both a and b. > > I hope that helps. > > > > -Mark > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Friday, January 10, 2025 10:23 AM > *To:* users@daffodil.apache.org > *Subject:* Re: BLOB feature - is it being used? > > > > Thanks Mark, > > I have a question, or rather I really don't understand parsing image data > but also using BLOBs to process image content. > > This is from the Wiki page describing the BLOB feature: > > *A variety of data formats such as for image and video files, consist of > fields of what is effectively metadata, surrounding large blocks of data > containing compressed image or video data.* > > > > *An important use case for DFDL is to expose this metadata for easy use, > and to provide access to the large data via a streaming mechanism akin to > opening a file, rather than including large chunks of a hexBinary string in > the infoset, as is common today.* > > The above suggests BLOBs can be used to keep a giant array of pixel bytes > out of memory. So far so good. > > > > But if you are trying to both > > > > (a) expose and inspect/sanitize the image metadata, and also > > (b) process the image (e.g., to remove steganography), > > > > then I don't see how this works. Standard image processing libraries are > going to want the entire image "file", not just the pixel data bytes from > somewhere down inside that file. That implies that the BLOB isn't just the > blob of pixel data, but rather the BLOB must be the entire image "file" > extracted from within the surrounding data envelope. But that implies that > you are not using a DFDL schema to parse the image field by field so as to > inspect/sanitize the metadata fields. > > > > In other words, DFDL+BLOB extension will let you do (a) or (b) but not > both. > > > > Do I have this right, or am I misunderstanding the use case? > > > > Thanks for any info > > > > -mike beckerle > > > > > > > > > > > > > > > > On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com> > wrote: > > I have occasionally used this feature to get pixel data (other than NITF) > out to a file so that it can be processed using other image processing > software. > > -----Original Message----- > From: Mike Beckerle <mbecke...@apache.org> > Sent: Thursday, January 9, 2025 2:14 PM > To: users@daffodil.apache.org > Subject: BLOB feature - is it being used? > > Is anyone using the experimental BLOB feature in Daffodil. > (https://s.apache.org/daffodil-blob-feature) > > If so, please reply, or you can email me directly. > > This BLOB feature was added as we thought it would be used for the pixels > of > images. > > I've not seen any questions about it or discussion since it got > implemented. > > I do know that it is used in the NITF DFDL schema on github, but the test > data for that schema does *not* use that element at all, so nothing that is > part of that schema exercises the feature. > > I ask because this extension to DFDL v1.0, if used, would be a strong > candidate for inclusion in the next version of the DFDL specification (from > OGF and ISO). > But.... if nobody is using the BLOB feature, that means other techniques > are > sufficient, and then there will be push back within the DFDL Workgroup > against adding this feature to DFDL as part of the standard. > > Personally, I have used this idea for "blobs" of data: > > <element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ... > the big blob length ...}'> > <complexType> > <sequence> > <element name="blob" dfdl:lengthKind="implicit"> > <!-- this blob element allows a dfdl:outputValueCalc='{ > dfdl:contentLength(..../pixels/blob) }' to work to capture the length when > unparsing --> > <complexType> > <sequence> > <!-- Avoid giant lines. This is XML. Users *may* want to > open it in a text editor. > Note max size of blob is 100000100. > --> > <element name="a" type="xs:hexBinary" minOccurs="0" > maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100" > dfdl:occursCountKind="implicit"/> > <element name="last" type="xs:hexBinary" > minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/> > </sequence> > </complexType> > </element> > </sequence> > </complexType> > </element> > > That combined with the use of EXI to avoid the XML text bloat seems like it > would address most needs. > >