> > I don't know if it's possible to capture the DICOM format in DFDL
Hah, a research team generated DICOM DFDL schemas from LLMs: https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-claude-3-5-haiku-20241022.xsd https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-claude-3-5-sonnet-20241022.xsd https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-deepseek-ai-deepseek-v3.xsd https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-gemini-1.5-flash.xsd https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-gpt-4-turbo.xsd https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-gpt-4o.xsd https://github.com/narfindustries/llm-tests-langsec/blob/main/results/1.0/DICOM/dicom-meta-llama-llama-3.3-70b-instruct-turbo.xsd The schema from gemini particularly stands out ;) On Fri, Jan 10, 2025 at 7:04 PM Claude Mamo <claude.m...@gmail.com> wrote: > My 2 cents. I reckon this BLOB feature would be useful to have as part of > the spec long-term. In healthcare integration, DICOM to FHIR integration is > a use case which I came across. I don't know if it's possible to capture > the DICOM format in DFDL (it seems doable at first glance), but suppose it > is, then I can easily imagine a situation where the integrator wants to > lift the metadata from the DICOM file to create resources on a FHIR server > and dump the pixel data somewhere else. > > Claude > > On Fri, Jan 10, 2025 at 4:50 PM Mike Beckerle <mbecke...@apache.org> > wrote: > >> Very helpful thanks Mark. I can cite this thread of emails as support for >> the BLOB feature to be added to DFDL v2.0. >> >> On Fri, Jan 10, 2025 at 10:46 AM Mark Kozak <mark.ko...@adeptus-cs.com> >> wrote: >> >>> Yes, I agree, but… >>> >>> When I say ‘other image processing software’, I am not talking about >>> photoshop or other ‘standard’ commercial applications that require a well >>> formed image file like a JFIF. I have files that use various compression >>> algorithms such as JPEG2000 for example. I can write that compressed pixel >>> blob and use a JPEG 2000 library to decompress it to work with the actual >>> pixel values outside the DFDL dataflow. Once decompressed, the processing >>> could be python scripts or any other pixel processing software. It’s also >>> helpful in examining blob payloads that are supposed to be image data but >>> are not behaving as one might expect. I can use a variable to turn on image >>> debug mode to get the image blob to a file for examination. >>> >>> >>> >>> So, yes, there are times I do both a and b. >>> >>> I hope that helps. >>> >>> >>> >>> -Mark >>> >>> >>> >>> *From:* Mike Beckerle <mbecke...@apache.org> >>> *Sent:* Friday, January 10, 2025 10:23 AM >>> *To:* users@daffodil.apache.org >>> *Subject:* Re: BLOB feature - is it being used? >>> >>> >>> >>> Thanks Mark, >>> >>> I have a question, or rather I really don't understand parsing image >>> data but also using BLOBs to process image content. >>> >>> This is from the Wiki page describing the BLOB feature: >>> >>> *A variety of data formats such as for image and video files, consist of >>> fields of what is effectively metadata, surrounding large blocks of data >>> containing compressed image or video data.* >>> >>> >>> >>> *An important use case for DFDL is to expose this metadata for easy use, >>> and to provide access to the large data via a streaming mechanism akin to >>> opening a file, rather than including large chunks of a hexBinary string in >>> the infoset, as is common today.* >>> >>> The above suggests BLOBs can be used to keep a giant array of pixel >>> bytes out of memory. So far so good. >>> >>> >>> >>> But if you are trying to both >>> >>> >>> >>> (a) expose and inspect/sanitize the image metadata, and also >>> >>> (b) process the image (e.g., to remove steganography), >>> >>> >>> >>> then I don't see how this works. Standard image processing libraries are >>> going to want the entire image "file", not just the pixel data bytes from >>> somewhere down inside that file. That implies that the BLOB isn't just the >>> blob of pixel data, but rather the BLOB must be the entire image "file" >>> extracted from within the surrounding data envelope. But that implies that >>> you are not using a DFDL schema to parse the image field by field so as to >>> inspect/sanitize the metadata fields. >>> >>> >>> >>> In other words, DFDL+BLOB extension will let you do (a) or (b) but not >>> both. >>> >>> >>> >>> Do I have this right, or am I misunderstanding the use case? >>> >>> >>> >>> Thanks for any info >>> >>> >>> >>> -mike beckerle >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Jan 9, 2025 at 4:02 PM Mark Kozak <mark.ko...@adeptus-cs.com> >>> wrote: >>> >>> I have occasionally used this feature to get pixel data (other than NITF) >>> out to a file so that it can be processed using other image processing >>> software. >>> >>> -----Original Message----- >>> From: Mike Beckerle <mbecke...@apache.org> >>> Sent: Thursday, January 9, 2025 2:14 PM >>> To: users@daffodil.apache.org >>> Subject: BLOB feature - is it being used? >>> >>> Is anyone using the experimental BLOB feature in Daffodil. >>> (https://s.apache.org/daffodil-blob-feature) >>> >>> If so, please reply, or you can email me directly. >>> >>> This BLOB feature was added as we thought it would be used for the >>> pixels of >>> images. >>> >>> I've not seen any questions about it or discussion since it got >>> implemented. >>> >>> I do know that it is used in the NITF DFDL schema on github, but the test >>> data for that schema does *not* use that element at all, so nothing that >>> is >>> part of that schema exercises the feature. >>> >>> I ask because this extension to DFDL v1.0, if used, would be a strong >>> candidate for inclusion in the next version of the DFDL specification >>> (from >>> OGF and ISO). >>> But.... if nobody is using the BLOB feature, that means other techniques >>> are >>> sufficient, and then there will be push back within the DFDL Workgroup >>> against adding this feature to DFDL as part of the standard. >>> >>> Personally, I have used this idea for "blobs" of data: >>> >>> <element name="pixels" dfdl:lengthKind="explicit" dfdl:length='{ ... >>> the big blob length ...}'> >>> <complexType> >>> <sequence> >>> <element name="blob" dfdl:lengthKind="implicit"> >>> <!-- this blob element allows a dfdl:outputValueCalc='{ >>> dfdl:contentLength(..../pixels/blob) }' to work to capture the length >>> when >>> unparsing --> >>> <complexType> >>> <sequence> >>> <!-- Avoid giant lines. This is XML. Users *may* want to >>> open it in a text editor. >>> Note max size of blob is 100000100. >>> --> >>> <element name="a" type="xs:hexBinary" minOccurs="0" >>> maxOccurs="1000000" dfdl:lengthKind="explicit" dfdl:length="100" >>> dfdl:occursCountKind="implicit"/> >>> <element name="last" type="xs:hexBinary" >>> minOccurs="0" maxOccurs="1" dfdl:lengthKind="delimited"/> >>> </sequence> >>> </complexType> >>> </element> >>> </sequence> >>> </complexType> >>> </element> >>> >>> That combined with the use of EXI to avoid the XML text bloat seems like >>> it >>> would address most needs. >>> >>>