Enough of us are doing data archaeology (i.e., digging around in the bits) with LSBF data these days that I created this utility.
lsbfdump is a command line tool that creates a data dump at bits level for data that has dfdl:bitOrder="leastSignificantBitFirst". This is what it outputs for example: 01000110 01001100 01000101 01111111 | 0x00000000 00000000 00000001 00000001 00000010 | 0x00000004 00000000 00000000 00000000 00000000 | 0x00000008 00000000 00000000 00000000 00000000 | 0x0000000C 00000000 00111110 00000000 00000011 | 0x00000010 00000000 00000001 | 0x00000014 The address (in hex) is on the right. The bytes start on the right and increase moving left and downward. The least significant bit of each byte is on the right (as people usually write numbers). This is not at the status of an official Apache release or anything, you have to git clone the daffodil-extra repository ( via 'git clone https://github.com/apache/daffodil-extra.git' ) and build your own, and it is not tagged or anything. (The whole daffodil-extra repo on github is for these sort of unofficial side-pony projects and examples.) But building this is easy, and creates a small-ish native binary executable (less than 10MB in size) via the very cool *sbt native image* plugin (not to be confused with 'scala native' which I tried and it failed). The sbt native image plugin pulls down and uses GraalVM technology under the hood. Caveat: I have only built this on Linux. Have not tried MS-Windows - but sbt-native-image plugin and GraalVM documentation say that this will work. (I hope someone tries this, or maybe even contributes automated setup for the repo to auto-test this on linux and windows every commit.) See: https://github.com/apache/daffodil-extra/tree/main/lsbfdump I hope you find this useful. Interestingly, I used generative AI, specifically ChatGPT4.0 to create the first draft of this, which I subsequently modified to clean it up, but it was a huge time saver. The rest of this message is about generative AI. If you are interested in generative-AI tools for programming assist, here's a link to the whole chat session I did to create this lsbfdump tool: Scala Binary Bytes Display <https://chat.openai.com/share/44944901-f1fb-4e53-8e87-042584ac61f5> . If you have not used chatGPT for coding before you may find it of interest. I also recently tried google bard (updated just last week) and it now seems to generate useful & interesting code now as well, and even includes reference citations to its sources of knowledge. To whet your appetite, here are some prompts I gave chatGPT 4.0. This was kind of like 'pair programming' where it was writing the code, and I was prompting it what to change or add. First... "Scala program to display bytes 4 at a time right to left ordering, as binary bits." (creates and explains program) Next... "Can you create all the files for a complete scala-native project that implements this taking a file name and an offset into that file as where to start displaying the bytes?" (modifies program accordingly) Next... "Two changes. One make the offset argument optional. Two, add an option to append the address, the address of the first byte (rightmost byte) of each line, on the right side of each line either in decimal or in hex." (modifies program accordingly) Next... "A few more changes. Add another optional argument which is the number of bytes to print with the default value of 128 if not provided. Change the --address to --noAddress and invert the sense so that you get the address displayed by default, and supplying --noAddress turns it off. Do not display the byte in hex before the bits. Do not display a dashed line between rows. Change the name of the scala object to LSBFDump" (modifies program accordingly) I had to ask it to correct some mistakes it made, which it did smoothly, and at some point it was clear the remaining changes were easier for me to just do, than ask the bot to do them, so I took the code and by-hand modified it after that. Some things it did were very nice. I told it to "add a --help option which displays an explanation of usage" and then to "add the usage examples to the help text" and voila it adds it and creates a nice help text, etc: def printHelp(): Unit = { println( """Usage: LSBFDump --file <filename> [--offset <offset>] [--length <numBytes>] [--noAddress] [--help] | |<filename> : The file to read bytes from or '-' for standard input. |[offset] : The starting offset in the file (default is 0). |[length] : The number of bytes to display (default is 128). |--noAddress : Do not display the address of each byte line. |--help : Display this help information. | |Examples: | Default usage (128 bytes from standard input, starting at offset 0, with addresses): | ./LSBFDump --file - | | With specific file, offset and byte count: | ./LSBFDump --file filename --offset 10 --length 64 | | With --noAddress to hide addresses: | ./LSBFDump --file filename --offset 10 --length 64 --noAddress |""".stripMargin) } Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com