Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages
On Wed, Sep 21, 2016 at 8:28 PM, Jakub Narębski wrote: > W dniu 21.09.2016 o 20:04, Ævar Arnfjörð Bjarmason pisze: >> It would make some code like git_print_log() a bit more complex / >> fragile, since it would have to work on multi-line strings, but >> anything that needed to do a regex match / replacement would be much >> faster. > > Would it? Did you perform any synthetic micro-benchmark? No, just experience. With the caveat that this may not matter at all in this context, C-like code in Perl is slow, if you can offload things to one big regex operation it's usually faster. >> >> But OTOH I think perhaps we're worrying about nothing when it comes to >> the performance. I haven't been able to make gitweb display more than >> a 100 or so commits at a time (haven't found where exactly in the code >> these limits are), any munging we do on the log messages would have to >> be pretty damn slow to matter. > > sub git_log_generic { > > # [...] > > my @commitlist = > parse_commits($commit_hash, 101, (100 * $page), > defined $file_name ? ($file_name, > "--full-history") : ()); > > Here you have it (it probably should be a constant; this number can be > found in a few other places). Thanks!
Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages
W dniu 21.09.2016 o 20:04, Ævar Arnfjörð Bjarmason pisze: > On Wed, Sep 21, 2016 at 6:26 PM, Jakub Narębski wrote: > >> P.S. I have reworking of commit message parsing and enhancement in my >> long, long and dated gitweb TODO list :-( > > Anything specific you could share? Some of TODO I would have to bring from backups, as the computer on which I did majority of gitweb development has since died (from old age). The list includes: - implement caching of gitweb output - revamp handling of encoding (UTF-8 with fallback encoding) - split gitweb into modules, while maintaining ease of install - refactor handling of diffs - better handling of config files - document URI structure, perhaps revamp URI parsing and generation - make commit message transformation generic (see below) > > One thing that would be a lot faster in Perl is if we didn't have to > pass the log around as split-up lines and could just operate on it as > one big string. Well, there are a few transformations that commit message undergoes in gitweb, including linking SHA1, optional linking of bug numbers to bug tracker, and syntax highlighting of signoff lines (trailer lines). I would like to have this cleaned up, and refactored. With all those transformations we would need to keep account which parts are HTML, and which not and need escaping (note: URI escape != HTML escape). > > It would make some code like git_print_log() a bit more complex / > fragile, since it would have to work on multi-line strings, but > anything that needed to do a regex match / replacement would be much > faster. Would it? Did you perform any synthetic micro-benchmark? > > But OTOH I think perhaps we're worrying about nothing when it comes to > the performance. I haven't been able to make gitweb display more than > a 100 or so commits at a time (haven't found where exactly in the code > these limits are), any munging we do on the log messages would have to > be pretty damn slow to matter. sub git_log_generic { # [...] my @commitlist = parse_commits($commit_hash, 101, (100 * $page), defined $file_name ? ($file_name, "--full-history") : ()); Here you have it (it probably should be a constant; this number can be found in a few other places). Best, -- Jakub Narębski
Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages
On Wed, Sep 21, 2016 at 6:26 PM, Jakub Narębski wrote: > P.S. I have reworking of commit message parsing and enhancement in my > long, long and dated gitweb TODO list :-( Anything specific you could share? One thing that would be a lot faster in Perl is if we didn't have to pass the log around as split-up lines and could just operate on it as one big string. It would make some code like git_print_log() a bit more complex / fragile, since it would have to work on multi-line strings, but anything that needed to do a regex match / replacement would be much faster. But OTOH I think perhaps we're worrying about nothing when it comes to the performance. I haven't been able to make gitweb display more than a 100 or so commits at a time (haven't found where exactly in the code these limits are), any munging we do on the log messages would have to be pretty damn slow to matter. > P.P.S. Kay Sievers no longer works on gitweb, and I think no longer > works at SuSE but at RedHat. Yup, been getting bounces from his address.
Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages
W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał: > Subject: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages This is modification of a feature, not a new feature it sounds like. I think the following title / subject would be better: Subject: [PATCH 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+ > > Change the minimum length of a commit we'll link to from 8 to 7. I think it would read better as: Change the minimum length of an abbreviated object identifier in the commit message gitweb tries to turn into link from 8 hexchars to 7. > > This arbitrary minimum length of 8 was introduced in > v1.4.4.2-151-gbfe2191, but as seen in e.g. v1.7.4-1-gdce9648 the > default abbreviation length is 7. Right. I wonder why it was 8 in gitweb... > > It's still possible to reference SHA1s down to 4 characters in length, > see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make > git actually produce that, so I doubt anyone is putting that into log > messages in practice, but people definitely do put 7 character SHA1s > into log messages. There is an additional problem: the shorter SHA1 abbrev we try to match, the more possibility of false positives, words that only look like (shortened SHA-1). For 7 characters there is at last one word that can be mistaken for SHA1 abbrev, namely 'deedeed' (hopefully rare in commit messages). For 6 characters we have 'accede', 'beaded', 'decade' (!), 'deface', 'facade' (!!), and possibly more (and of course all 7 character hexdigit words). Also, the number of digits provided as an optional parameter to --abbrev or --abbrev-commit options is only a minimal number of hexdigits: Git would use as many as needed for the abbreviated SHA-1 to be unambiguous, at current time. I think allowing 7-character shortened SHA-1, which is what Git produces for smaller repositories by default is (might be?) a good idea. Thanks for the patch. > > I think it's fairly dubious to link to things matching [0-9a-fA-F] > here as opposed to just [0-9a-f], that dates back to the initial > version of gitweb from 161332a. Git will accept all-caps SHA1s, but > didn't ever produce them as far as I can tell. All right, thanks for reminder. Signoff? > --- > gitweb/gitweb.perl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 9473daf..101dbc0 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -2036,7 +2036,7 @@ sub format_log_line_html { > my $line = shift; > > $line = esc_html($line, -nbsp=>1); > - $line =~ s{\b([0-9a-fA-F]{8,40})\b}{ > + $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ > $cgi->a({-href => href(action=>"object", hash=>$1), > -class => "text"}, $1); > }eg; > Nice and simple. P.S. I have reworking of commit message parsing and enhancement in my long, long and dated gitweb TODO list :-( P.P.S. Kay Sievers no longer works on gitweb, and I think no longer works at SuSE but at RedHat. Best, -- Jakub Narębski