Introducing DALEQ: An Open-Source Tool for Assessing Java Binary Equivalence

We’re excited to announce the release of DALEQ — a new open-source tool for 
analyzing and comparing Java binaries. DALEQ is designed to help developers, 
security researchers, and build engineers assess whether two .jar files built 
from the same source code are semantically equivalent, even when they’re not 
bitwise identical. This is particularly useful for comparing  jars from Maven 
Central and jars produced via reproducible builds, or  generated by services 
like Oracle’s build-from-source or Google’s Assured OSS. Although tools like 
diff or hash-based checks can detect binary differences, they don’t explain why 
binaries differ, or whether those differences matter. Bytecode-level 
differences can be caused by changes in compilers or build pipelines — not 
necessarily by compromised builds. DALEQ helps distinguish harmless variation 
from meaningful divergence.

How DALEQ Works

DALEQ focuses on Java bytecode comparison, though it can also analyze resources 
and metadata in jars. At its core, DALEQ uses a datalog engine (Soufflé) — the 
same kind of logic-based analysis engine used in systems like CodeQL — to 
normalize and compare bytecode structures. Key features include:

- Bytecode normalization to reduce irrelevant build differences
- Semantic diffing that identifies and explains non-equivalent instructions
- Provenance tracking: For equivalent files, DALEQ shows how equivalence was 
derived via datalog rules, for non-equivalent files, it provides bytecode-level 
diffs

DALEQ also verifies whether the underlying source code inputs are the same (or 
at least equivalent, tolerating some variations in comments and formatting) and 
includes integrations with existing tools like the standard javap disassembler. 
It supports extensibility through a plugin system.

Real-World Evaluation

DALEQ builds on our earlier research into levels of binary equivalence. We 
evaluated the tool using real-world .jar files from Oracle and Google, both of 
whom independently rebuild Java packages from source. The results are 
encouraging: DALEQ was able to classify 85–90% of .class files that were not 
bitwise identical as still being semantically equivalent, with supporting 
provenance.

Learn More

You can try out DALEQ now on GitHub: https://github.com/binaryeq/daleq/
A detailed technical paper describing DALEQ and our evaluation: 
https://arxiv.org/abs/2508.01530
A technical paper describing the conceptual approach of levels of binary 
equivalence: https://arxiv.org/abs/2410.08427 (to be presented at 
ICSME’25<https://conf.researchr.org/home/icsme-2025>)


Jens Dietrich (Associate Professor at Victoria University of Wellington)

Behnaz Hassanshahi (Principal Researcher and Tech Lead at Oracle, Oracle Labs 
Brisbane)

  *






Reply via email to