Re: BLOB feature - is it being used?

Mike Beckerle Fri, 10 Jan 2025 07:23:43 -0800

Thanks Mark,

I have a question, or rather I really don't understand parsing image data
but also using BLOBs to process image content.


This is from the Wiki page describing the BLOB feature:

*A variety of data formats such as for image and video files, consist of
fields of what is effectively metadata, surrounding large blocks of data
containing compressed image or video data.*


*An important use case for DFDL is to expose this metadata for easy use,
and to provide access to the large data via a streaming mechanism akin to
opening a file, rather than including large chunks of a hexBinary string in
the infoset, as is common today.*

The above suggests BLOBs can be used to keep a giant array of pixel bytes
out of memory. So far so good.

But if you are trying to both

(a) expose and inspect/sanitize the image metadata, and also
(b) process the image (e.g., to remove steganography),

then I don't see how this works. Standard image processing libraries are
going to want the entire image "file", not just the pixel data bytes from
somewhere down inside that file.  That implies that the BLOB isn't just the
blob of pixel data, but rather the BLOB must be the entire image "file"
extracted from within the surrounding data envelope. But that implies that
you are not using a DFDL schema to parse the image field by field so as to
inspect/sanitize the metadata fields.

In other words, DFDL+BLOB extension will let you do (a) or (b) but not
both.

Do I have this right, or am I misunderstanding the use case?

Thanks for any info

-mike beckerle








On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com> wrote:

> I have occasionally used this feature to get pixel data (other than NITF)
> out to a file so that it can be processed using other image processing
> software.
>
> -----Original Message-----
> From: Mike Beckerle <mbecke...@apache.org>
> Sent: Thursday, January 9, 2025 2:14 PM
> To: users@daffodil.apache.org
> Subject: BLOB feature - is it being used?
>
> Is anyone using the experimental BLOB feature in Daffodil.
> (https://s.apache.org/daffodil-blob-feature)
>
> If so, please reply, or you can email me directly.
>
> This BLOB feature was added as we thought it would be used for the pixels
> of
> images.
>
> I've not seen any questions about it or discussion since it got
> implemented.
>
> I do know that it is used in the NITF DFDL schema on github, but the test
> data for that schema does *not* use that element at all, so nothing that is
> part of that schema exercises the feature.
>
> I ask because this extension to DFDL v1.0, if used, would be a strong
> candidate for inclusion in the next version of the DFDL specification (from
> OGF and ISO).
> But.... if nobody is using the BLOB feature, that means other techniques
> are
> sufficient, and then there will be push back within the DFDL Workgroup
> against adding this feature to DFDL as part of the standard.
>
> Personally, I have used this idea for "blobs" of data:
>
> <element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ...
> the big blob length ...}'>
>   <complexType>
>     <sequence>
>        <element name="blob" dfdl:lengthKind="implicit">
>            <!-- this blob element allows a dfdl:outputValueCalc='{
> dfdl:contentLength(..../pixels/blob) }' to work to capture the length when
> unparsing -->
>            <complexType>
>               <sequence>
>                  <!-- Avoid giant lines. This is XML. Users *may* want to
> open it in a text editor.
>                         Note max size of blob is 100000100.
>                     -->
>                  <element name="a" type="xs:hexBinary" minOccurs="0"
> maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100"
> dfdl:occursCountKind="implicit"/>
>                  <element name="last" type="xs:hexBinary"
> minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/>
>              </sequence>
>           </complexType>
>         </element>
>      </sequence>
>    </complexType>
> </element>
>
> That combined with the use of EXI to avoid the XML text bloat seems like it
> would address most needs.
>

Re: BLOB feature - is it being used?

Reply via email to