Mike Beckerle created DAFFODIL-2351:
---------------------------------------

             Summary: layer improvements to enable JPEG format
                 Key: DAFFODIL-2351
                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2351
             Project: Daffodil
          Issue Type: Bug
          Components: Back End
    Affects Versions: 2.6.0
            Reporter: Mike Beckerle
             Fix For: 3.0.0


JPEG format has "Entropy Coded Segments" or ECS Segments.

These are terminated by the byte-pattern that indicates the start of the 
following JPEG segment, so we need the ability to isolate these bytes by 
finding, but not consuming, the start of the next segment. 

Currently the only way to do this is with lengthKind='pattern', and a regex 
with lookahead. This is problematic due to the way the implementation of regex 
scanning works (buffers that are gradually enlarged if needed).  The buffers 
cannot be made big enough and this will simply not work for JPEG's with very 
large images (JPEG2000 format has the same problem and holds even larger 
images). 

The ability to define a layer that contains data up to, but not including, a 
particular marker is needed. In JPEG the marker is a 2-byte sequence.

In addition, for JPEG, these ECS segments are "byte stuffed", which is an 
escaping scheme where if the first byte of the marker is found in the data it 
is modified by inserting a zero byte after it so that it does not match the 
marker. This inserted zero needs to be removed from the data on parsing, and 
re-inserted on unparsing by the layer transform. 

Finally, all the implementation of this feature needs to not require staging a 
copy of the entire contents of the ECS segment in any array, so long as the 
ultimate destination of the bytes is as a DFDL BLOB (extension to DFDL v1.0). 
These layers need to allow streaming the bytes of the ECS segment out to an 
external BLOB (e.g., a BLOB file) without the need to create any object in the 
Daffodil process memory that is the size of the whole ECS segment. 

   

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to