Re: BLOB feature - is it being used?

Mike Beckerle Fri, 10 Jan 2025 07:50:17 -0800

Very helpful thanks Mark. I can cite this thread of emails as support for
the BLOB feature to be added to DFDL v2.0.


On Fri, Jan 10, 2025 at 10:46 AM Mark Kozak <mark.ko...@adeptus-cs.com>
wrote:

> Yes, I agree, but…
>
> When I say ‘other image processing software’, I am not talking about
> photoshop or other ‘standard’ commercial applications that require a well
> formed image file like a JFIF. I have files that use various compression
> algorithms such as JPEG2000 for example. I can write that compressed pixel
> blob and use a JPEG 2000 library to decompress it to work with the actual
> pixel values outside the DFDL dataflow. Once decompressed, the processing
> could be python scripts or any other pixel processing software. It’s also
> helpful in examining blob payloads that are supposed to be image data but
> are not behaving as one might expect. I can use a variable to turn on image
> debug mode to get the image blob to a file for examination.
>
>
>
> So, yes, there are times I do both a and b.
>
> I hope that helps.
>
>
>
> -Mark
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Friday, January 10, 2025 10:23 AM
> *To:* users@daffodil.apache.org
> *Subject:* Re: BLOB feature - is it being used?
>
>
>
> Thanks Mark,
>
> I have a question, or rather I really don't understand parsing image data
> but also using BLOBs to process image content.
>
> This is from the Wiki page describing the BLOB feature:
>
> *A variety of data formats such as for image and video files, consist of
> fields of what is effectively metadata, surrounding large blocks of data
> containing compressed image or video data.*
>
>
>
> *An important use case for DFDL is to expose this metadata for easy use,
> and to provide access to the large data via a streaming mechanism akin to
> opening a file, rather than including large chunks of a hexBinary string in
> the infoset, as is common today.*
>
> The above suggests BLOBs can be used to keep a giant array of pixel bytes
> out of memory. So far so good.
>
>
>
> But if you are trying to both
>
>
>
> (a) expose and inspect/sanitize the image metadata, and also
>
> (b) process the image (e.g., to remove steganography),
>
>
>
> then I don't see how this works. Standard image processing libraries are
> going to want the entire image "file", not just the pixel data bytes from
> somewhere down inside that file.  That implies that the BLOB isn't just the
> blob of pixel data, but rather the BLOB must be the entire image "file"
> extracted from within the surrounding data envelope. But that implies that
> you are not using a DFDL schema to parse the image field by field so as to
> inspect/sanitize the metadata fields.
>
>
>
> In other words, DFDL+BLOB extension will let you do (a) or (b) but not
> both.
>
>
>
> Do I have this right, or am I misunderstanding the use case?
>
>
>
> Thanks for any info
>
>
>
> -mike beckerle
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com>
> wrote:
>
> I have occasionally used this feature to get pixel data (other than NITF)
> out to a file so that it can be processed using other image processing
> software.
>
> -----Original Message-----
> From: Mike Beckerle <mbecke...@apache.org>
> Sent: Thursday, January 9, 2025 2:14 PM
> To: users@daffodil.apache.org
> Subject: BLOB feature - is it being used?
>
> Is anyone using the experimental BLOB feature in Daffodil.
> (https://s.apache.org/daffodil-blob-feature)
>
> If so, please reply, or you can email me directly.
>
> This BLOB feature was added as we thought it would be used for the pixels
> of
> images.
>
> I've not seen any questions about it or discussion since it got
> implemented.
>
> I do know that it is used in the NITF DFDL schema on github, but the test
> data for that schema does *not* use that element at all, so nothing that is
> part of that schema exercises the feature.
>
> I ask because this extension to DFDL v1.0, if used, would be a strong
> candidate for inclusion in the next version of the DFDL specification (from
> OGF and ISO).
> But.... if nobody is using the BLOB feature, that means other techniques
> are
> sufficient, and then there will be push back within the DFDL Workgroup
> against adding this feature to DFDL as part of the standard.
>
> Personally, I have used this idea for "blobs" of data:
>
> <element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ...
> the big blob length ...}'>
>   <complexType>
>     <sequence>
>        <element name="blob" dfdl:lengthKind="implicit">
>            <!-- this blob element allows a dfdl:outputValueCalc='{
> dfdl:contentLength(..../pixels/blob) }' to work to capture the length when
> unparsing -->
>            <complexType>
>               <sequence>
>                  <!-- Avoid giant lines. This is XML. Users *may* want to
> open it in a text editor.
>                         Note max size of blob is 100000100.
>                     -->
>                  <element name="a" type="xs:hexBinary" minOccurs="0"
> maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100"
> dfdl:occursCountKind="implicit"/>
>                  <element name="last" type="xs:hexBinary"
> minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/>
>              </sequence>
>           </complexType>
>         </element>
>      </sequence>
>    </complexType>
> </element>
>
> That combined with the use of EXI to avoid the XML text bloat seems like it
> would address most needs.
>
>

Re: BLOB feature - is it being used?

Reply via email to