My 2 cents. I reckon this BLOB feature would be useful to have as part of
the spec long-term. In healthcare integration, DICOM to FHIR integration is
a use case which I came across. I don't know if it's possible to capture
the DICOM format in DFDL (it seems doable at first glance), but suppose it
is, then I can easily imagine a situation where the integrator wants to
lift the metadata from the DICOM file to create resources on a FHIR server
and dump the pixel data somewhere else.

Claude

On Fri, Jan 10, 2025 at 4:50 PM Mike Beckerle <mbecke...@apache.org> wrote:

> Very helpful thanks Mark. I can cite this thread of emails as support for
> the BLOB feature to be added to DFDL v2.0.
>
> On Fri, Jan 10, 2025 at 10:46 AM Mark Kozak <mark.ko...@adeptus-cs.com>
> wrote:
>
>> Yes, I agree, but…
>>
>> When I say ‘other image processing software’, I am not talking about
>> photoshop or other ‘standard’ commercial applications that require a well
>> formed image file like a JFIF. I have files that use various compression
>> algorithms such as JPEG2000 for example. I can write that compressed pixel
>> blob and use a JPEG 2000 library to decompress it to work with the actual
>> pixel values outside the DFDL dataflow. Once decompressed, the processing
>> could be python scripts or any other pixel processing software. It’s also
>> helpful in examining blob payloads that are supposed to be image data but
>> are not behaving as one might expect. I can use a variable to turn on image
>> debug mode to get the image blob to a file for examination.
>>
>>
>>
>> So, yes, there are times I do both a and b.
>>
>> I hope that helps.
>>
>>
>>
>> -Mark
>>
>>
>>
>> *From:* Mike Beckerle <mbecke...@apache.org>
>> *Sent:* Friday, January 10, 2025 10:23 AM
>> *To:* users@daffodil.apache.org
>> *Subject:* Re: BLOB feature - is it being used?
>>
>>
>>
>> Thanks Mark,
>>
>> I have a question, or rather I really don't understand parsing image data
>> but also using BLOBs to process image content.
>>
>> This is from the Wiki page describing the BLOB feature:
>>
>> *A variety of data formats such as for image and video files, consist of
>> fields of what is effectively metadata, surrounding large blocks of data
>> containing compressed image or video data.*
>>
>>
>>
>> *An important use case for DFDL is to expose this metadata for easy use,
>> and to provide access to the large data via a streaming mechanism akin to
>> opening a file, rather than including large chunks of a hexBinary string in
>> the infoset, as is common today.*
>>
>> The above suggests BLOBs can be used to keep a giant array of pixel bytes
>> out of memory. So far so good.
>>
>>
>>
>> But if you are trying to both
>>
>>
>>
>> (a) expose and inspect/sanitize the image metadata, and also
>>
>> (b) process the image (e.g., to remove steganography),
>>
>>
>>
>> then I don't see how this works. Standard image processing libraries are
>> going to want the entire image "file", not just the pixel data bytes from
>> somewhere down inside that file.  That implies that the BLOB isn't just the
>> blob of pixel data, but rather the BLOB must be the entire image "file"
>> extracted from within the surrounding data envelope. But that implies that
>> you are not using a DFDL schema to parse the image field by field so as to
>> inspect/sanitize the metadata fields.
>>
>>
>>
>> In other words, DFDL+BLOB extension will let you do (a) or (b) but not
>> both.
>>
>>
>>
>> Do I have this right, or am I misunderstanding the use case?
>>
>>
>>
>> Thanks for any info
>>
>>
>>
>> -mike beckerle
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com>
>> wrote:
>>
>> I have occasionally used this feature to get pixel data (other than NITF)
>> out to a file so that it can be processed using other image processing
>> software.
>>
>> -----Original Message-----
>> From: Mike Beckerle <mbecke...@apache.org>
>> Sent: Thursday, January 9, 2025 2:14 PM
>> To: users@daffodil.apache.org
>> Subject: BLOB feature - is it being used?
>>
>> Is anyone using the experimental BLOB feature in Daffodil.
>> (https://s.apache.org/daffodil-blob-feature)
>>
>> If so, please reply, or you can email me directly.
>>
>> This BLOB feature was added as we thought it would be used for the pixels
>> of
>> images.
>>
>> I've not seen any questions about it or discussion since it got
>> implemented.
>>
>> I do know that it is used in the NITF DFDL schema on github, but the test
>> data for that schema does *not* use that element at all, so nothing that
>> is
>> part of that schema exercises the feature.
>>
>> I ask because this extension to DFDL v1.0, if used, would be a strong
>> candidate for inclusion in the next version of the DFDL specification
>> (from
>> OGF and ISO).
>> But.... if nobody is using the BLOB feature, that means other techniques
>> are
>> sufficient, and then there will be push back within the DFDL Workgroup
>> against adding this feature to DFDL as part of the standard.
>>
>> Personally, I have used this idea for "blobs" of data:
>>
>> <element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ...
>> the big blob length ...}'>
>>   <complexType>
>>     <sequence>
>>        <element name="blob" dfdl:lengthKind="implicit">
>>            <!-- this blob element allows a dfdl:outputValueCalc='{
>> dfdl:contentLength(..../pixels/blob) }' to work to capture the length when
>> unparsing -->
>>            <complexType>
>>               <sequence>
>>                  <!-- Avoid giant lines. This is XML. Users *may* want to
>> open it in a text editor.
>>                         Note max size of blob is 100000100.
>>                     -->
>>                  <element name="a" type="xs:hexBinary" minOccurs="0"
>> maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100"
>> dfdl:occursCountKind="implicit"/>
>>                  <element name="last" type="xs:hexBinary"
>> minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/>
>>              </sequence>
>>           </complexType>
>>         </element>
>>      </sequence>
>>    </complexType>
>> </element>
>>
>> That combined with the use of EXI to avoid the XML text bloat seems like
>> it
>> would address most needs.
>>
>>

Reply via email to