Yes, I agree, but…

When I say ‘other image processing software’, I am not talking about photoshop 
or other ‘standard’ commercial applications that require a well formed image 
file like a JFIF. I have files that use various compression algorithms such as 
JPEG2000 for example. I can write that compressed pixel blob and use a JPEG 
2000 library to decompress it to work with the actual pixel values outside the 
DFDL dataflow. Once decompressed, the processing could be python scripts or any 
other pixel processing software. It’s also helpful in examining blob payloads 
that are supposed to be image data but are not behaving as one might expect. I 
can use a variable to turn on image debug mode to get the image blob to a file 
for examination.

 

So, yes, there are times I do both a and b.

I hope that helps.

 

-Mark

 

From: Mike Beckerle <mbecke...@apache.org> 
Sent: Friday, January 10, 2025 10:23 AM
To: users@daffodil.apache.org
Subject: Re: BLOB feature - is it being used?

 

Thanks Mark,

I have a question, or rather I really don't understand parsing image data but 
also using BLOBs to process image content. 

This is from the Wiki page describing the BLOB feature:

A variety of data formats such as for image and video files, consist of fields 
of what is effectively metadata, surrounding large blocks of data containing 
compressed image or video data.

 

An important use case for DFDL is to expose this metadata for easy use, and to 
provide access to the large data via a streaming mechanism akin to opening a 
file, rather than including large chunks of a hexBinary string in the infoset, 
as is common today.

The above suggests BLOBs can be used to keep a giant array of pixel bytes out 
of memory. So far so good. 

 

But if you are trying to both 

 

(a) expose and inspect/sanitize the image metadata, and also 

(b) process the image (e.g., to remove steganography), 

 

then I don't see how this works. Standard image processing libraries are going 
to want the entire image "file", not just the pixel data bytes from somewhere 
down inside that file.  That implies that the BLOB isn't just the blob of pixel 
data, but rather the BLOB must be the entire image "file" extracted from within 
the surrounding data envelope. But that implies that you are not using a DFDL 
schema to parse the image field by field so as to inspect/sanitize the metadata 
fields. 

 

In other words, DFDL+BLOB extension will let you do (a) or (b) but not both. 

 

Do I have this right, or am I misunderstanding the use case?

 

Thanks for any info

 

-mike beckerle

 

 

 

 

 

 

 

On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com 
<mailto:mark.ko...@adeptus-cs.com> > wrote:

I have occasionally used this feature to get pixel data (other than NITF)
out to a file so that it can be processed using other image processing 
software.

-----Original Message-----
From: Mike Beckerle <mbecke...@apache.org <mailto:mbecke...@apache.org> >
Sent: Thursday, January 9, 2025 2:14 PM
To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> 
Subject: BLOB feature - is it being used?

Is anyone using the experimental BLOB feature in Daffodil.
(https://s.apache.org/daffodil-blob-feature)

If so, please reply, or you can email me directly.

This BLOB feature was added as we thought it would be used for the pixels of
images.

I've not seen any questions about it or discussion since it got implemented.

I do know that it is used in the NITF DFDL schema on github, but the test
data for that schema does *not* use that element at all, so nothing that is
part of that schema exercises the feature.

I ask because this extension to DFDL v1.0, if used, would be a strong
candidate for inclusion in the next version of the DFDL specification (from
OGF and ISO).
But.... if nobody is using the BLOB feature, that means other techniques are
sufficient, and then there will be push back within the DFDL Workgroup
against adding this feature to DFDL as part of the standard.

Personally, I have used this idea for "blobs" of data:

<element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ...
the big blob length ...}'>
  <complexType>
    <sequence>
       <element name="blob" dfdl:lengthKind="implicit">
           <!-- this blob element allows a dfdl:outputValueCalc='{
dfdl:contentLength(..../pixels/blob) }' to work to capture the length when
unparsing -->
           <complexType>
              <sequence>
                 <!-- Avoid giant lines. This is XML. Users *may* want to
open it in a text editor.
                        Note max size of blob is 100000100.
                    -->
                 <element name="a" type="xs:hexBinary" minOccurs="0"
maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100"
dfdl:occursCountKind="implicit"/>
                 <element name="last" type="xs:hexBinary"
minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/>
             </sequence>
          </complexType>
        </element>
     </sequence>
   </complexType>
</element>

That combined with the use of EXI to avoid the XML text bloat seems like it
would address most needs.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to