Re: BLOB feature - is it being used?

Claude Mamo Sat, 15 Feb 2025 04:37:38 -0800

>
> I don't know if it's possible to capture the DICOM format in DFDL



Hah, a research team generated DICOM DFDL schemas from LLMs:

https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-claude-3-5-haiku-20241022.xsd
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-claude-3-5-sonnet-20241022.xsd
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-deepseek-ai-deepseek-v3.xsd
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-gemini-1.5-flash.xsd
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-gpt-4-turbo.xsd
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-gpt-4o.xsd
https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-meta-llama-llama-3.3-70b-instruct-turbo.xsd

The schema from gemini particularly stands out ;)

On Fri, Jan 10, 2025 at 7:04 PM Claude Mamo <claude.m...@gmail.com> wrote:

> My 2 cents. I reckon this BLOB feature would be useful to have as part of
> the spec long-term. In healthcare integration, DICOM to FHIR integration is
> a use case which I came across. I don't know if it's possible to capture
> the DICOM format in DFDL (it seems doable at first glance), but suppose it
> is, then I can easily imagine a situation where the integrator wants to
> lift the metadata from the DICOM file to create resources on a FHIR server
> and dump the pixel data somewhere else.
>
> Claude
>
> On Fri, Jan 10, 2025 at 4:50 PM Mike Beckerle <mbecke...@apache.org>
> wrote:
>
>> Very helpful thanks Mark. I can cite this thread of emails as support for
>> the BLOB feature to be added to DFDL v2.0.
>>
>> On Fri, Jan 10, 2025 at 10:46 AM Mark Kozak <mark.ko...@adeptus-cs.com>
>> wrote:
>>
>>> Yes, I agree, but…
>>>
>>> When I say ‘other image processing software’, I am not talking about
>>> photoshop or other ‘standard’ commercial applications that require a well
>>> formed image file like a JFIF. I have files that use various compression
>>> algorithms such as JPEG2000 for example. I can write that compressed pixel
>>> blob and use a JPEG 2000 library to decompress it to work with the actual
>>> pixel values outside the DFDL dataflow. Once decompressed, the processing
>>> could be python scripts or any other pixel processing software. It’s also
>>> helpful in examining blob payloads that are supposed to be image data but
>>> are not behaving as one might expect. I can use a variable to turn on image
>>> debug mode to get the image blob to a file for examination.
>>>
>>>
>>>
>>> So, yes, there are times I do both a and b.
>>>
>>> I hope that helps.
>>>
>>>
>>>
>>> -Mark
>>>
>>>
>>>
>>> *From:* Mike Beckerle <mbecke...@apache.org>
>>> *Sent:* Friday, January 10, 2025 10:23 AM
>>> *To:* users@daffodil.apache.org
>>> *Subject:* Re: BLOB feature - is it being used?
>>>
>>>
>>>
>>> Thanks Mark,
>>>
>>> I have a question, or rather I really don't understand parsing image
>>> data but also using BLOBs to process image content.
>>>
>>> This is from the Wiki page describing the BLOB feature:
>>>
>>> *A variety of data formats such as for image and video files, consist of
>>> fields of what is effectively metadata, surrounding large blocks of data
>>> containing compressed image or video data.*
>>>
>>>
>>>
>>> *An important use case for DFDL is to expose this metadata for easy use,
>>> and to provide access to the large data via a streaming mechanism akin to
>>> opening a file, rather than including large chunks of a hexBinary string in
>>> the infoset, as is common today.*
>>>
>>> The above suggests BLOBs can be used to keep a giant array of pixel
>>> bytes out of memory. So far so good.
>>>
>>>
>>>
>>> But if you are trying to both
>>>
>>>
>>>
>>> (a) expose and inspect/sanitize the image metadata, and also
>>>
>>> (b) process the image (e.g., to remove steganography),
>>>
>>>
>>>
>>> then I don't see how this works. Standard image processing libraries are
>>> going to want the entire image "file", not just the pixel data bytes from
>>> somewhere down inside that file.  That implies that the BLOB isn't just the
>>> blob of pixel data, but rather the BLOB must be the entire image "file"
>>> extracted from within the surrounding data envelope. But that implies that
>>> you are not using a DFDL schema to parse the image field by field so as to
>>> inspect/sanitize the metadata fields.
>>>
>>>
>>>
>>> In other words, DFDL+BLOB extension will let you do (a) or (b) but not
>>> both.
>>>
>>>
>>>
>>> Do I have this right, or am I misunderstanding the use case?
>>>
>>>
>>>
>>> Thanks for any info
>>>
>>>
>>>
>>> -mike beckerle
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com>
>>> wrote:
>>>
>>> I have occasionally used this feature to get pixel data (other than NITF)
>>> out to a file so that it can be processed using other image processing
>>> software.
>>>
>>> -----Original Message-----
>>> From: Mike Beckerle <mbecke...@apache.org>
>>> Sent: Thursday, January 9, 2025 2:14 PM
>>> To: users@daffodil.apache.org
>>> Subject: BLOB feature - is it being used?
>>>
>>> Is anyone using the experimental BLOB feature in Daffodil.
>>> (https://s.apache.org/daffodil-blob-feature)
>>>
>>> If so, please reply, or you can email me directly.
>>>
>>> This BLOB feature was added as we thought it would be used for the
>>> pixels of
>>> images.
>>>
>>> I've not seen any questions about it or discussion since it got
>>> implemented.
>>>
>>> I do know that it is used in the NITF DFDL schema on github, but the test
>>> data for that schema does *not* use that element at all, so nothing that
>>> is
>>> part of that schema exercises the feature.
>>>
>>> I ask because this extension to DFDL v1.0, if used, would be a strong
>>> candidate for inclusion in the next version of the DFDL specification
>>> (from
>>> OGF and ISO).
>>> But.... if nobody is using the BLOB feature, that means other techniques
>>> are
>>> sufficient, and then there will be push back within the DFDL Workgroup
>>> against adding this feature to DFDL as part of the standard.
>>>
>>> Personally, I have used this idea for "blobs" of data:
>>>
>>> <element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ...
>>> the big blob length ...}'>
>>>   <complexType>
>>>     <sequence>
>>>        <element name="blob" dfdl:lengthKind="implicit">
>>>            <!-- this blob element allows a dfdl:outputValueCalc='{
>>> dfdl:contentLength(..../pixels/blob) }' to work to capture the length
>>> when
>>> unparsing -->
>>>            <complexType>
>>>               <sequence>
>>>                  <!-- Avoid giant lines. This is XML. Users *may* want to
>>> open it in a text editor.
>>>                         Note max size of blob is 100000100.
>>>                     -->
>>>                  <element name="a" type="xs:hexBinary" minOccurs="0"
>>> maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100"
>>> dfdl:occursCountKind="implicit"/>
>>>                  <element name="last" type="xs:hexBinary"
>>> minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/>
>>>              </sequence>
>>>           </complexType>
>>>         </element>
>>>      </sequence>
>>>    </complexType>
>>> </element>
>>>
>>> That combined with the use of EXI to avoid the XML text bloat seems like
>>> it
>>> would address most needs.
>>>
>>>

Re: BLOB feature - is it being used?

Reply via email to