Re: Ev2: providing checksums of files Re: svn commit: r1241733 - /subversion/trunk/subversion/libsvn_delta/compat.c
On Thu, Feb 9, 2012 at 12:24 PM, Daniel Shahaf danie...@elego.de wrote: hwri...@apache.org wrote on Wed, Feb 08, 2012 at 01:55:55 -: Author: hwright Date: Wed Feb 8 01:55:54 2012 New Revision: 1241733 URL: http://svn.apache.org/viewvc?rev=1241733view=rev Log: Ev2 shims: Make sure that if we are providing content to alter_file(), we are also providing a checksum. * subversion/libsvn_delta/compat.c (path_checksum_args): Remove checksum member. (process_actions): Checksum the content file. (ev2_apply_textdelta): Don't checksum the output stream. Modified: subversion/trunk/subversion/libsvn_delta/compat.c Modified: subversion/trunk/subversion/libsvn_delta/compat.c URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_delta/compat.c?rev=1241733r1=1241732r2=1241733view=diff == --- subversion/trunk/subversion/libsvn_delta/compat.c (original) +++ subversion/trunk/subversion/libsvn_delta/compat.c Wed Feb 8 01:55:54 2012 @@ -203,7 +203,6 @@ struct path_checksum_args { const char *path; svn_revnum_t base_revision; - svn_checksum_t *checksum; }; static svn_error_t * @@ -340,9 +339,10 @@ process_actions(void *edit_baton, /* We can only set text on files. */ kind = svn_node_file; + SVN_ERR(svn_io_file_checksum2(checksum, pca-path, + svn_checksum_sha1, scratch_pool)); SVN_ERR(svn_stream_open_readonly(contents, pca-path, scratch_pool, scratch_pool)); So, you first open the file and read it through to checksum it, and then open it again to read it through as a stream. That's because right now add_file()/alter_file() require the checksum to be provided up front. Would it make sense to relax that requirement? On the one hand, many receivers don't need to know the checksum until after they've received the whole file (eg, the fs txn process and the pristine store); on the other hand, having the checksum will allow in some cases not transmitting the file over the wire at all, and will allow for error-checking the transmission in other cases. Agreed that the current approach taken by the shim is wasteful, though other attempts were a bit dodgy, and this one Just Works. If there are better ways to get this checksum, let's use 'em. I'm not certain about relaxing the requirement just yet. I'll have to think about the design implications. (I'll also note that we actually *do* have a checksum by this point, only it is the md5 provided by close_file(), and Ev2 uses sha1s exclusively, so we have to recalculate. I suspect this will be a somewhat common issue while we use a mixed shim environment.) -Hyrum -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com/
Re: Ev2: providing checksums of files Re: svn commit: r1241733 - /subversion/trunk/subversion/libsvn_delta/compat.c
Hyrum K Wright wrote on Thu, Feb 09, 2012 at 12:54:50 -0600: (I'll also note that we actually *do* have a checksum by this point, only it is the md5 provided by close_file(), and Ev2 uses sha1s exclusively, so we have to recalculate. I suspect this will be a Only sha1? It allows neither md5 nor sha256? Why? somewhat common issue while we use a mixed shim environment.)
Re: Ev2: providing checksums of files Re: svn commit: r1241733 - /subversion/trunk/subversion/libsvn_delta/compat.c
On Thu, Feb 9, 2012 at 1:05 PM, Daniel Shahaf danie...@elego.de wrote: Hyrum K Wright wrote on Thu, Feb 09, 2012 at 12:54:50 -0600: (I'll also note that we actually *do* have a checksum by this point, only it is the md5 provided by close_file(), and Ev2 uses sha1s exclusively, so we have to recalculate. I suspect this will be a Only sha1? It allows neither md5 nor sha256? Why? Convention would be my initial response. It makes much more sense to standardize checksum kinds across the system than to have a mixed environment. Any checksum can help detect corruption, but being able to answer the question is this content the same as that content? is much more difficult in a mixed-checksum environment. sha1 felt like a reasonable choice[1]. I could be overestimating the negatives, though. What reasons do you suppose we *should* allow arbitrary checksum types? -Hyrum [1] Though, given the timeline of Ev2 and the proposed timeline for sha3, I'm really interested in the possibility of using sha3 when standardized, throughout *all* of Subversion. (But that's a completely separate discussion.) -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com/
Re: Ev2: providing checksums of files Re: svn commit: r1241733 - /subversion/trunk/subversion/libsvn_delta/compat.c
Hyrum K Wright wrote on Thu, Feb 09, 2012 at 13:16:13 -0600: On Thu, Feb 9, 2012 at 1:05 PM, Daniel Shahaf danie...@elego.de wrote: Hyrum K Wright wrote on Thu, Feb 09, 2012 at 12:54:50 -0600: (I'll also note that we actually *do* have a checksum by this point, only it is the md5 provided by close_file(), and Ev2 uses sha1s exclusively, so we have to recalculate. I suspect this will be a Only sha1? It allows neither md5 nor sha256? Why? Convention would be my initial response. It makes much more sense to standardize checksum kinds across the system than to have a mixed environment. Any checksum can help detect corruption, but being able to answer the question is this content the same as that content? is much more difficult in a mixed-checksum environment. sha1 felt like a It's hardly impossible. It shouldn't take much effort to whip up an SQLite table that maps sha1's to sha3's (both in the wc layer and the FS layer). reasonable choice[1]. I could be overestimating the negatives, though. What reasons do you suppose we *should* allow arbitrary checksum types? I didn't say arbitrary; but hard-wiring a specific checksum algorithm into our APIs seems wrong. Suppose Ev2 is in 1.8 and then 1.9 clients want to use sha4 with 1.9 servers. Does the editor interface allow for that? I'm assuming we could mandate sha1 support but allow driver/receiver to negotiate on using a different algorithm. -Hyrum [1] Though, given the timeline of Ev2 and the proposed timeline for sha3, I'm really interested in the possibility of using sha3 when standardized, throughout *all* of Subversion. (But that's a completely separate discussion.) -- uberSVN: Apache Subversion Made Easy http://www.uberSVN.com/