I think we are talking about different things. You are talking about the DataStore save space by deduplicate data and reducing redundancy in a general way. This can be done by the DataStore without additional information, I totally agree with you here. The hash returned is for example also a way to reduce duplicates.
What I'm talking about is storing concrete diffs instead of whole files. This is not possible. The DataStore is called from Jackrabbit (addRecord) with some stream and has absolutely no idea what the original file was. So it can't determine the concrete diff. Sure it can look for similar files (performance?) and make a diff (binaries?) to the most similar file found, but that is maybe not the diff to the previous file from application view. Regards, Robert -----Ursprüngliche Nachricht----- Von: Thomas Mueller [mailto:[email protected]] Gesendet: Mittwoch, 13. Juli 2011 11:37 An: [email protected] Betreff: Re: AW: AW: AW: Incremental/deduplicating versioning Hi, >The only possible way is to add the information (the DataStore does not >have like the identifier/content of the previous version) to the stream >(addRecord) from the application side. I suggest to read http://en.wikipedia.org/wiki/Data_deduplication - specially "Depending on the type of deduplication, redundant files may be reduced, or even portions of files or other data that are similar can also be removed." and http://en.wikipedia.org/wiki/Rsync#Algorithm Regards, Thomas
