Re: Upgrade to gitweb-1.8.3.1-20 on centos7 breaks git access
If I remove /etc/httpd/conf.d/git.conf then I just get blank page in the browser. the ssl_error log shows [Sat Aug 17 16:17:08.926093 2019] [cgi:error] [pid 18299] [client x.x.x.x:25050] AH01215: Request not supported: '/export/gitrepos/ABC/' On 8/17/2019 4:03 PM, Gaiseric Vandal wrote: I recently applied the latest patches on centos7, including gitweb-1.8.3.1-20. When I try to access git repos (either via web browser or git command line.) Repos are configured to require LDAP authentication. I should be able to access repo ABC via https://myserver.domain.com/git/ABC I get the correct user login prompt but then , after logging, get following error The requested URL /git/ABC/ was not found on this server. I don't think any of the config files got changed. My partial config is # cat /etc/httpd/conf.d/git.conf Alias /git /var/www/git Options +ExecCGI AddHandler cgi-script .cgi DirectoryIndex gitweb.cgi # # cat /etc/httpd/conf.d/gitrepos.conf SetEnv GIT_PROJECT_ROOT /export/gitrepos SetEnv GIT_HTTP_EXPORT_ALL ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/ RewriteEngine on RewriteCond %{QUERY_STRING} service=git-receive-pack [OR] RewriteCond %{REQUEST_URI} /git-receive-pack$ RewriteRule ^/git/ - [E=AUTHREQUIRED] AuthType Basic AuthBasicProvider ldap AuthLDAPURL "ldaps://xxx)" AuthLDAPBindDN "uid=" AuthLDAPBindPassword xxx AuthName "Git Access" Require user x x x Order deny,allow Deny from env=AUTHREQUIRED Satisfy any AuthType Basic AuthBasicProvider ldap AuthLDAPURL "ldaps://xxx)" AuthLDAPBindDN "uid=" AuthLDAPBindPassword xxx AuthName "Git Access ABC" Require user x x x Order deny,allow The /var/www/git/gitweb.cgi file has the default settings. ... our $GIT = "/usr/bin/git"; # absolute fs-path which will be prepended to the project path our $projectroot = "/var/lib/git"; #our $projectroot = "/export/gitrepos"; ... If I set $projectroot to point to /export/gitrepos, I can see a list of projects when I browse to https://myserver.domain.com/git- however I wanted to minimize how much information was exposed with no authentication. And it didn't fix the access problem. The "git" command is in the default OS path. changing our $GIT = "/usr/bin/git"; to our $GIT = "git"; did not help. The only thing in the httpd error logs that looks relevant is [Sat Aug 17 15:39:39.826265 2019] [core:info] [pid 15870] [client x] AH00128: File does not exist: /var/www/git/ABC/ [Sat Aug 17 15:39:44.831598 2019] [ssl:info] [pid 15870] (70007)The timeout specified has expired: [client x] AH01991: SSL input filter read failed. selinux is disabled. Since I am getting an authentication prompt with "Git Access ABC" in the dialog box it seems clear that the project root is being picked up from /etc/httpd/conf.d/gitrepos.conf and that the /usr/libexec/git-core/git-http-backend script is being run. Appreciate any advice. Thanks
Upgrade to gitweb-1.8.3.1-20 on centos7 breaks git access
I recently applied the latest patches on centos7, including gitweb-1.8.3.1-20. When I try to access git repos (either via web browser or git command line.) Repos are configured to require LDAP authentication. I should be able to access repo ABC via https://myserver.domain.com/git/ABC I get the correct user login prompt but then , after logging, get following error The requested URL /git/ABC/ was not found on this server. I don't think any of the config files got changed. My partial config is # cat /etc/httpd/conf.d/git.conf Alias /git /var/www/git Options +ExecCGI AddHandler cgi-script .cgi DirectoryIndex gitweb.cgi # # cat /etc/httpd/conf.d/gitrepos.conf SetEnv GIT_PROJECT_ROOT /export/gitrepos SetEnv GIT_HTTP_EXPORT_ALL ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/ RewriteEngine on RewriteCond %{QUERY_STRING} service=git-receive-pack [OR] RewriteCond %{REQUEST_URI} /git-receive-pack$ RewriteRule ^/git/ - [E=AUTHREQUIRED] AuthType Basic AuthBasicProvider ldap AuthLDAPURL "ldaps://xxx)" AuthLDAPBindDN "uid=" AuthLDAPBindPassword xxx AuthName "Git Access" Require user x x x Order deny,allow Deny from env=AUTHREQUIRED Satisfy any AuthType Basic AuthBasicProvider ldap AuthLDAPURL "ldaps://xxx)" AuthLDAPBindDN "uid=" AuthLDAPBindPassword xxx AuthName "Git Access ABC" Require user x x x Order deny,allow The /var/www/git/gitweb.cgi file has the default settings. ... our $GIT = "/usr/bin/git"; # absolute fs-path which will be prepended to the project path our $projectroot = "/var/lib/git"; #our $projectroot = "/export/gitrepos"; ... If I set $projectroot to point to /export/gitrepos, I can see a list of projects when I browse to https://myserver.domain.com/git- however I wanted to minimize how much information was exposed with no authentication. And it didn't fix the access problem. The "git" command is in the default OS path. changing our $GIT = "/usr/bin/git"; to our $GIT = "git"; did not help. The only thing in the httpd error logs that looks relevant is [Sat Aug 17 15:39:39.826265 2019] [core:info] [pid 15870] [client x] AH00128: File does not exist: /var/www/git/ABC/ [Sat Aug 17 15:39:44.831598 2019] [ssl:info] [pid 15870] (70007)The timeout specified has expired: [client x] AH01991: SSL input filter read failed. selinux is disabled. Since I am getting an authentication prompt with "Git Access ABC" in the dialog box it seems clear that the project root is being picked up from /etc/httpd/conf.d/gitrepos.conf and that the /usr/libexec/git-core/git-http-backend script is being run. Appreciate any advice. Thanks
Re: How to get gitweb setup with nginx + uwsgi?
Hi, On Mon, Aug 12, 2019, at 12:16 PM, Konstantin Ryabitsev wrote: > We have it working in a similar configuration, but with CGit instead of > GitWeb. Unless you have specific requirements to run GitWeb, I recommend > you use CGit for your web frontend, as it offers many features GitWeb > doesn't, such as caching -- plus it works really well with uwsgi > (git.kernel.org runs in that configuration). Didn't realize that! I'd picked gitweb in the first place as I figured, being part of git, that kernel project would be using it. Clearly I'm wrong. Switching to CGit now. Thanks for the suggestion. I don't use Arch (Opensuse User), but am trying to follow their CGit + UWSGI + Gitolite docs: https://wiki.archlinux.org/index.php/Cgit#Using_uwsgi Since my distro pkgs have different paths, systemd setup, etc. having some challenges getting a everything-working-together config setup. That'll just take some reading and staring ! ;-) If you know of any other good examples , please share :-) TA!
Re: How to get gitweb setup with nginx + uwsgi?
On Mon, Aug 12, 2019 at 08:09:22AM -0700, ko...@mailc.net wrote: Hi all. I'm setting up a local Git server, with Gitweb + Gitolite. We have it working in a similar configuration, but with CGit instead of GitWeb. Unless you have specific requirements to run GitWeb, I recommend you use CGit for your web frontend, as it offers many features GitWeb doesn't, such as caching -- plus it works really well with uwsgi (git.kernel.org runs in that configuration). Best, -K
How to get gitweb setup with nginx + uwsgi?
Hi all. I'm setting up a local Git server, with Gitweb + Gitolite. The gitolite wrapper is installed & working. Now I'm working on the Gitweb frontend. I run Nginx as my webserver. Usually with PHP, using fpm. Gitweb's gitweb.cgi looks like it needs perl CGI. For perl cgi I'm trying to get it working with UWSGI, https://uwsgi-docs.readthedocs.io/en/latest/Nginx.html https://nginx.org/en/docs/http/ngx_http_uwsgi_module.html#example I installed git --version git version 2.22.0 ls -al /usr/share/gitweb/gitweb.cgi -rwxr-xr-x 1 root root 247K Jul 24 05:27 /usr/share/gitweb/gitweb.cgi grep "\$version =" /usr/share/gitweb/gitweb.cgi our $version = "2.22.0"; nginx -v nginx version: nginx/1.17.1 uwsgi --version 2.0.18 I set up the nginx vhost server { listen 127.0.0.1:60080 http2; root /usr/share/gitweb; index gitweb.cgi; location / { try_files $uri $uri/ @gitweb; } location @gitweb { root /usr/share/gitweb; include uwsgi_params; gzip off; uwsgi_param UWSGI_SCRIPT gitweb; uwsgi_param GITWEB_CONFIG /etc/gitweb/gitweb.conf; uwsgi_pass unix:/run/uwsgi/uwsgi.sock; uwsgi_modifier1 5; } } and the uwsgi server /etc/uwsgi/uwsgi.ini [uwsgi] strict = 1 master = true processes = 2 binary-path = /usr/sbin/uwsgi plugin-dir = /usr/lib64/uwsgi logto = /var/log/uwsgi/uwsgi.log uid = wwwrun gid = www umask = 022 uwsgi-socket = /run/uwsgi/uwsgi.sock chmod-socket = 660 chown-socket = wwwrun:www plugins = http,psgi chdir = /usr/share/gitweb psgi = gitweb.cgi nginx & uwsgi services are both running ps aux | egrep "nginx|uwsgi" wwwrun 17463 0.0 0.1 89468 23704 ?Ss 07:03 0:00 /usr/sbin/uwsgi --autoload --ini /etc/uwsgi/uwsgi.ini wwwrun 17465 0.0 0.1 97664 17184 ?Sl 07:03 0:00 /usr/sbin/uwsgi --autoload --ini /etc/uwsgi/uwsgi.ini wwwrun 17468 0.0 0.1 97664 17184 ?Sl 07:03 0:00 /usr/sbin/uwsgi --autoload --ini /etc/uwsgi/uwsgi.ini root 18006 0.0 0.0 211264 4276 ?Ss 07:10 0:00 nginx: master process /opt/nginx/sbin/nginx -c /etc/nginx/nginx.conf -g pid /run/nginx.pid; wwwrun 18007 0.0 0.0 211416 5492 ?S07:10 0:00 nginx: worker process wwwrun 18008 0.0 0.0 212068 10300 ?S07:10 0:00 nginx: worker process wwwrun 18009 0.0 0.0 211416 5492 ?S07:10 0:00 nginx: worker process wwwrun 18011 0.0 0.0 211416 5492 ?S07:10 0:00 nginx: worker process wwwrun 18012 0.0 0.0 211452 5052 ?S07:10 0:00 nginx: cache manager process ls -al /run/uwsgi/uwsgi.sock srw-rw 1 wwwrun www 0 Aug 12 07:03 /run/uwsgi/uwsgi.sock= when I go to the site http://127.0.0.1:60080/ I just get the script listing in the browser #!/usr/bin/perl # gitweb - simple web interface to track changes in git repositories # # (C) 2005-2006, Kay Sievers # (C) 2005, Christian Gierke # # This program is licensed under the GPLv2 use 5.008; use strict; use warnings; ... no errors anywhere, just the script display. I'm missing something basic since it's not running the script. :-/ Anyone have any experience with gitweb + uwsgi on nginx? Or know a good working example? Thanks!
[PATCH] doc: don't use git.kernel.org as example gitweb URL
git.kernel.org uses cgit, not gitweb, these days: $ w3m -dump 'http://git.kernel.org/?p=git/git.git;a=tree;f=gitweb' | grep -w generated generated by cgit 1.2-0.3.lf.el7 (git 2.18.0) at 2019-06-22 16:14:38 + Signed-off-by: Jakub Wilk --- Documentation/gitweb.txt | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/Documentation/gitweb.txt b/Documentation/gitweb.txt index c7436098c9..3cc9b034c4 100644 --- a/Documentation/gitweb.txt +++ b/Documentation/gitweb.txt @@ -28,8 +28,7 @@ Gitweb provides a web interface to Git repositories. Its features include: revisions one at a time, viewing the history of the repository. * Finding commits which commit messages matches given search term. -See http://git.kernel.org/?p=git/git.git;a=tree;f=gitweb[] or -http://repo.or.cz/w/git.git/tree/HEAD:/gitweb/[] for gitweb source code, +See http://repo.or.cz/w/git.git/tree/HEAD:/gitweb/[] for gitweb source code, browsed using gitweb itself. -- 2.20.1
Re: on fedora, "man gitweb" exists but actual gitweb command is missing
On Sat, 23 Feb 2019, Todd Zullinger wrote: > Hi, > > Robert P. J. Day wrote: > > > > not so much a git issue as what looks like a fedora packaging > > issue. > > Yeah, it's just a minor packaging issue. The gitweb manpages are > included in the main git package rather than in the gitweb package > with the rest of the gitweb files. I'll fix that for future > releases and when f29 is updated to 2.21 it will pick that up¹. ... snip ... ah, eggcellent, so i can remove this from my TODO list, thanks. rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
Re: on fedora, "man gitweb" exists but actual gitweb command is missing
Hi, Robert P. J. Day wrote: > > not so much a git issue as what looks like a fedora packaging issue. Yeah, it's just a minor packaging issue. The gitweb manpages are included in the main git package rather than in the gitweb package with the rest of the gitweb files. I'll fix that for future releases and when f29 is updated to 2.21 it will pick that up¹. Since gitweb requires git, you'd be sure to have the documentation after installing gitweb. If we made it possible to install gitweb without getting the documentation, that would be more annoying. :) ¹ https://src.fedoraproject.org/fork/tmz/rpms/git/c/0d9ad786 > it took only a few seconds to determine that fedora bundles that > functionality in two separate packages which are not dependencies of > "git": "gitweb" and "git-instaweb" (output abbreviated): > > $ sudo dnf install git-instaweb > ... > Installing: >git-instaweb > Installing dependencies: >gitweb >perl-CGI > > and now "git-instaweb" works fine. > > the question is, is it not inconsistent for fedora's basic "git" > package to include the man page for gitweb, without including the > corresponding functionality? is this something i should submit a > fedora bugzilla report for? or am i misunderstanding something? If I hadn't seen this thread, then a report in the fedora bugzilla would be the way to go to ensure one of the fedora git maintainers noticed it. Thanks, -- Todd
on fedora, "man gitweb" exists but actual gitweb command is missing
not so much a git issue as what looks like a fedora packaging issue. on my updated fedora 29 system, "man gitweb" works: GITWEB(1)Git Manual GITWEB(1) NAME gitweb - Git web interface (web frontend to Git repositories) SYNOPSIS To get started with gitweb, run git-instaweb(1) from a Git repository. This would configure and start your web server, and run web browser pointing to gitweb. ... snip ... and the corresponding man page source file clearly belongs to the "git" package: $ rpm -qf /usr/share/man/man1/gitweb.1.gz git-2.20.1-1.fc29.x86_64 $ but there is no "git instaweb" command: $ git instaweb git: 'instaweb' is not a git command. See 'git --help'. $ it took only a few seconds to determine that fedora bundles that functionality in two separate packages which are not dependencies of "git": "gitweb" and "git-instaweb" (output abbreviated): $ sudo dnf install git-instaweb ... Installing: git-instaweb Installing dependencies: gitweb perl-CGI and now "git-instaweb" works fine. the question is, is it not inconsistent for fedora's basic "git" package to include the man page for gitweb, without including the corresponding functionality? is this something i should submit a fedora bugzilla report for? or am i misunderstanding something? rday -- Robert P. J. Day Ottawa, Ontario, CANADA http://crashcourse.ca/dokuwiki Twitter: http://twitter.com/rpjday LinkedIn: http://ca.linkedin.com/in/rpjday
[PATCH v2 35/35] gitweb: make hash size independent
Gitweb has several hard-coded 40 values throughout it to check for values that are passed in or acquired from Git. To simplify the code, introduce a regex variable that matches either exactly 40 or exactly 64 hex characters, and use this variable anywhere we would have previously hard-coded a 40 in a regex. Add some helper functions which allow us to write tighter regexes that match exactly the number of hex characters we're expecting. Similarly, switch the code that looks for deleted diffinfo information to look for either 40 or 64 zeros, and update one piece of code to use this function. Finally, when formatting a log line, allow an abbreviated describe output to contain up to 64 characters. Helped-by: Ævar Arnfjörð Bjarmason Signed-off-by: brian m. carlson --- gitweb/gitweb.perl | 97 -- 1 file changed, 67 insertions(+), 30 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2594a4badb..7fef19fe59 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -788,6 +788,38 @@ sub check_loadavg { # == # input validation and dispatch +# Various hash size-related values. +my $sha1_len = 40; +my $sha256_extra_len = 24; +my $sha256_len = $sha1_len + $sha256_extra_len; + +# A regex matching $len hex characters. $len may be a range (e.g. 7,64). +sub oid_nlen_regex { + my $len = shift; + my $hchr = qr/[0-9a-fA-F]/; + return qr/(?:(?:$hchr){$len})/; +} + +# A regex matching two sets of $nlen hex characters, prefixed by the literal +# string $prefix and with the literal string $infix between them. +sub oid_nlen_prefix_infix_regex { + my $nlen = shift; + my $prefix = shift; + my $infix = shift; + + my $rx = oid_nlen_regex($nlen); + + return qr/^\Q$prefix\E$rx\Q$infix\E$rx$/; +} + +# A regex matching a valid object ID. +our $oid_regex; +{ + my $x = oid_nlen_regex($sha1_len); + my $y = oid_nlen_regex($sha256_extra_len); + $oid_regex = qr/(?:$x(?:$y)?)/; +} + # input parameters can be collected from a variety of sources (presently, CGI # and PATH_INFO), so we define an %input_params hash that collects them all # together during validation: this allows subsequent uses (e.g. href()) to be @@ -1516,7 +1548,7 @@ sub is_valid_refname { return undef unless defined $input; # textual hashes are O.K. - if ($input =~ m/^[0-9a-fA-F]{40}$/) { + if ($input =~ m/^$oid_regex$/) { return 1; } # it must be correct pathname @@ -2028,6 +2060,9 @@ sub file_type_long { sub format_log_line_html { my $line = shift; + # Potentially abbreviated OID. + my $regex = oid_nlen_regex("7,64"); + $line = esc_html($line, -nbsp=>1); $line =~ s{ \b @@ -2037,10 +2072,10 @@ sub format_log_line_html { (?'; } # match - if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) { + if ($line =~ oid_nlen_prefix_infix_regex($sha1_len, "index ", ",") | + $line =~ oid_nlen_prefix_infix_regex($sha256_len, "index ", ",")) { # can match only for combined diff $line = 'index '; for (my $i = 0; $i < $diffinfo->{'nparents'}; $i++) { @@ -2308,7 +2344,8 @@ sub format_extended_diff_header_line { $line .= '0' x 7; } - } elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) { + } elsif ($line =~ oid_nlen_prefix_infix_regex($sha1_len, "index ", "..") | +$line =~ oid_nlen_prefix_infix_regex($sha256_len, "index ", "..")) { # can match only for ordinary diff my ($from_link, $to_link); if ($from->{'href'}) { @@ -2834,7 +2871,7 @@ sub git_get_hash_by_path { } #'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa panic.c' - $line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/; + $line =~ m/^([0-9]+) (.+) ($oid_regex)\t/; if (defined $type && $type ne $2) { # type doesn't match return undef; @@ -,7 +3370,7 @@ sub git_get_references { while (my $line = <$fd>) { chomp $line; - if ($line =~ m!^([0-9a-fA-F]{40})\srefs/($type.*)$!) { + if ($line =~ m!^($oid_regex)\srefs/($type.*)$!) { if (defined $refs{$1}) { push @{$refs{$1}}, $2; } else { @@ -3407,7 +3444,7 @@ sub parse_tag { $tag{'id'} = $tag_id; while (my $line = <$fd>) { chomp $line; - if ($line =~ m/^object ([0-9a-fA-F]{40})$/) { + if (
Re: [PATCH 31/31] gitweb: make hash size independent
On Tue, Feb 12 2019, brian m. carlson wrote: > Gitweb has several hard-coded 40 values throughout it to check for > values that are passed in or acquired from Git. To simplify the code, > introduce a regex variable that matches either exactly 40 or exactly 64 > hex characters, and use this variable anywhere we would have previously > hard-coded a 40 in a regex. > > Similarly, switch the code that looks for deleted diffinfo information > to look for either 40 or 64 zeros, and update one piece of code to use > this function. Finally, when formatting a log line, allow an > abbreviated describe output to contain up to 64 characters. This might be going a bit overboard but I tried this with a variant where... > +# A regex matching a valid object ID. > +our $oid_regex = qr/(?:[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})?)/; > + Instead of this dense regex I did: my $sha1_len = 40; my $sha256_extra_len = 24; my $sha256_len = $sha1_len + $sha256_extra_len; sub oid_nlen_regex { my $len = shift; my $hchr = qr/[0-9a-fA-F]/; return qr/(?:(?:$hchr){$len})/ } our $oid_regex; { my $x = oid_nlen_regex($sha1_len); my $y = oid_nlen_regex($sha256_extra_len); $oid_regex = qr/(?:$x(?:$y)?)/ } Then most of the rest of this is the same, e.g.: > - if ($input =~ m/^[0-9a-fA-F]{40}$/) { But... > @@ -2037,10 +2040,10 @@ sub format_log_line_html { > (? [A-Za-z0-9.-]+ > (?!\.) # refs can't end with ".", see check_refname_format() > --g[0-9a-fA-F]{7,40} > +-g[0-9a-fA-F]{7,64} > | > # Just a normal looking Git SHA1 > -[0-9a-fA-F]{7,40} > +[0-9a-fA-F]{7,64} > ) > \b > }{ E.g. here we can do call oid_nlen_regex("7,64") to produce this blurb. > - if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) { > + if ($line =~ m/^index $oid_regex,$oid_regex/) { > - } elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) { > + } elsif ($line =~ m/^index $oid_regex..$oid_regex/) { And here, maybe nobody cares, but we now implicitly accept mixed SHA-1 & SHA-256 input. Whereas we could have a helper on top of the above code like: sub oid_nlen_prefix_infix_regex { my $nlen = shift; my $prefix = shift; my $infix = shift; my $rx = oid_nlen_regex($nlen); return qr/^\Q$prefix\E$rx\Q$infix\E$rx$/; } And then e.g.: } elsif ($line =~ oid_nlen_prefix_infix_regex($sha1_len, "index ", "..") || $line =~ oid_nlen_prefix_infix_regex($sha256_len, "index ", "..")) { So only accept SHA1..SHA1 or SHA256..SHA256, not SHA1..SHA256 or SHA256..SHA1.
[PATCH 31/31] gitweb: make hash size independent
Gitweb has several hard-coded 40 values throughout it to check for values that are passed in or acquired from Git. To simplify the code, introduce a regex variable that matches either exactly 40 or exactly 64 hex characters, and use this variable anywhere we would have previously hard-coded a 40 in a regex. Similarly, switch the code that looks for deleted diffinfo information to look for either 40 or 64 zeros, and update one piece of code to use this function. Finally, when formatting a log line, allow an abbreviated describe output to contain up to 64 characters. Signed-off-by: brian m. carlson --- gitweb/gitweb.perl | 63 -- 1 file changed, 33 insertions(+), 30 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2594a4badb..aec8ca3256 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -788,6 +788,9 @@ sub check_loadavg { # == # input validation and dispatch +# A regex matching a valid object ID. +our $oid_regex = qr/(?:[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})?)/; + # input parameters can be collected from a variety of sources (presently, CGI # and PATH_INFO), so we define an %input_params hash that collects them all # together during validation: this allows subsequent uses (e.g. href()) to be @@ -1516,7 +1519,7 @@ sub is_valid_refname { return undef unless defined $input; # textual hashes are O.K. - if ($input =~ m/^[0-9a-fA-F]{40}$/) { + if ($input =~ m/^$oid_regex$/) { return 1; } # it must be correct pathname @@ -2037,10 +2040,10 @@ sub format_log_line_html { (?'; } # match - if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) { + if ($line =~ m/^index $oid_regex,$oid_regex/) { # can match only for combined diff $line = 'index '; for (my $i = 0; $i < $diffinfo->{'nparents'}; $i++) { @@ -2308,7 +2311,7 @@ sub format_extended_diff_header_line { $line .= '0' x 7; } - } elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) { + } elsif ($line =~ m/^index $oid_regex..$oid_regex/) { # can match only for ordinary diff my ($from_link, $to_link); if ($from->{'href'}) { @@ -2834,7 +2837,7 @@ sub git_get_hash_by_path { } #'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa panic.c' - $line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/; + $line =~ m/^([0-9]+) (.+) ($oid_regex)\t/; if (defined $type && $type ne $2) { # type doesn't match return undef; @@ -,7 +3336,7 @@ sub git_get_references { while (my $line = <$fd>) { chomp $line; - if ($line =~ m!^([0-9a-fA-F]{40})\srefs/($type.*)$!) { + if ($line =~ m!^($oid_regex)\srefs/($type.*)$!) { if (defined $refs{$1}) { push @{$refs{$1}}, $2; } else { @@ -3407,7 +3410,7 @@ sub parse_tag { $tag{'id'} = $tag_id; while (my $line = <$fd>) { chomp $line; - if ($line =~ m/^object ([0-9a-fA-F]{40})$/) { + if ($line =~ m/^object ($oid_regex)$/) { $tag{'object'} = $1; } elsif ($line =~ m/^type (.+)$/) { $tag{'type'} = $1; @@ -3451,15 +3454,15 @@ sub parse_commit_text { } my $header = shift @commit_lines; - if ($header !~ m/^[0-9a-fA-F]{40}/) { + if ($header !~ m/^$oid_regex/) { return; } ($co{'id'}, my @parents) = split ' ', $header; while (my $line = shift @commit_lines) { last if $line eq "\n"; - if ($line =~ m/^tree ([0-9a-fA-F]{40})$/) { + if ($line =~ m/^tree ($oid_regex)$/) { $co{'tree'} = $1; - } elsif ((!defined $withparents) && ($line =~ m/^parent ([0-9a-fA-F]{40})$/)) { + } elsif ((!defined $withparents) && ($line =~ m/^parent ($oid_regex)$/)) { push @parents, $1; } elsif ($line =~ m/^author (.*) ([0-9]+) (.*)$/) { $co{'author'} = to_utf8($1); @@ -3591,7 +3594,7 @@ sub parse_difftree_raw_line { # ':100644 100644 03b218260e99b78c6df0ed378e59ed9205ccc96d 3b93d5e7cc7f7dd4ebed13a5cc1a4ad976fc94d8 M ls-files.c' # ':100644 100644 7f9281985086971d3877aca27704f2aaf9c448ce bc190ebc71bbd923f2b728e505408f5e54bd073a M rev-tree.c' - if ($line =~ m/^:([0-7]{6}) ([0-7]{6}) ([0-9a-fA-F]{40}) (
[PATCH] gitweb: correctly store previous rev in javascript-actions mode
From: Robert Luberda Date: Sun, 16 Mar 2014 22:57:19 +0100 Without this change, the setting $feature{'javascript-actions'}{'default'} = [1]; in gitweb.conf breaks gitweb's blame page: clicking on line numbers displayed in the second column on the page has no effect. For comparison, with javascript-actions disabled, clicking on line numbers loads the previous version of the line. Addresses https://bugs.debian.org/741883. Signed-off-by: Jonathan Nieder --- Hi Robert, Years ago, you sent this obviously correct patch to the link mentioned above, but it got lost in the noise. Sorry about that. Hopefully late is better than never. May we forge your sign-off? See https://www.kernel.org/pub/software/scm/git/docs/SubmittingPatches.html#sign-off for more details about what this means. Jakub et al, any thoughts? I don't see any unit tests in gitweb/static that could avoid this regressing --- am I missing some, or if not any hints for someone who would want to add a test framework? Thanks, Jonathan gitweb/static/js/blame_incremental.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gitweb/static/js/blame_incremental.js b/gitweb/static/js/blame_incremental.js index db6eb50584..e100d8206b 100644 --- a/gitweb/static/js/blame_incremental.js +++ b/gitweb/static/js/blame_incremental.js @@ -484,7 +484,7 @@ function processBlameLines(lines) { case 'previous': curCommit.nprevious++; // store only first 'previous' header - if (!'previous' in curCommit) { + if (!('previous' in curCommit)) { var parts = data.split(' ', 2); curCommit.previous= parts[0]; curCommit.file_parent = unquote(parts[1]); -- 2.20.0.405.gbc1bbc6f85
Re: gitweb: local configuration not found
Hello! > Yeah, it does look indirect. Despite what you said, it also would > support users giving an absolute path via GITWEB_CONFIG. > > With "use File::Spec", perhaps something like this? Yes, this looks right. Martin
Re: gitweb: local configuration not found
Ævar Arnfjörð Bjarmason writes: >> Documentation says "If you are absolutely certain that you want your >> script to load and execute a file from the current directory, then use >> a ./ prefix". We can do that, like so: >> >> diff --git i/gitweb/Makefile w/gitweb/Makefile >> index cd194d057f..3160b6cc5d 100644 >> --- i/gitweb/Makefile >> +++ w/gitweb/Makefile >> @@ -18,7 +18,7 @@ RM ?= rm -f >> INSTALL ?= install >> >> # default configuration for gitweb >> -GITWEB_CONFIG = gitweb_config.perl >> +GITWEB_CONFIG = ./gitweb_config.perl >> GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf >> GITWEB_CONFIG_COMMON = /etc/gitweb-common.conf >> GITWEB_HOME_LINK_STR = projects >> >> but that does not help if someone overrides GITWEB_CONFIG, and besides, >> it would be nicer to avoid the possibility of an @INC search altogether. >> ... > Just: > > local @INC = '.'; > do 'FILE.pl'; > > Would do the same thing, but seems like a more indirect way to do it if > all we want is ./ anyway. Yeah, it does look indirect. Despite what you said, it also would support users giving an absolute path via GITWEB_CONFIG. With "use File::Spec", perhaps something like this? gitweb/gitweb.perl | 4 1 file changed, 4 insertions(+) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2594a4badb..239e7cbc25 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -719,6 +719,10 @@ sub filter_and_validate_refs { sub read_config_file { my $filename = shift; return unless defined $filename; + + $filename = File::Spec->catfile(".", $filename) + unless File::Spec->file_name_is_absolute($filename); + # die if there are errors parsing config file if (-e $filename) { do $filename;
Re: gitweb: local configuration not found
On Wed, Dec 05 2018, Jonathan Nieder wrote: > Martin Mares wrote[1]: > >> After upgrade to Stretch, gitweb no longer finds the configuration file >> "gitweb_config.perl" in the current directory. However, "man gitweb" still >> mentions this as one of the possible locations of the config file (and >> indeed a useful one when using multiple instances of gitweb). >> >> It was probably broken by Perl dropping "." from the default search path >> for security reasons. > > Indeed, perldelta(1) tells me that in 5.24.1 (and 5.26, etc), > > Core modules and tools no longer search "." for optional modules > > gitweb.perl contains > > sub read_config_file { > my $filename = shift; > return unless defined $filename; > # die if there are errors parsing config file > if (-e $filename) { > do $filename; > > which implies an @INC search but it is silly --- as the "-e" test > illustrates, this never intended to search @INC. > > Documentation says "If you are absolutely certain that you want your > script to load and execute a file from the current directory, then use > a ./ prefix". We can do that, like so: > > diff --git i/gitweb/Makefile w/gitweb/Makefile > index cd194d057f..3160b6cc5d 100644 > --- i/gitweb/Makefile > +++ w/gitweb/Makefile > @@ -18,7 +18,7 @@ RM ?= rm -f > INSTALL ?= install > > # default configuration for gitweb > -GITWEB_CONFIG = gitweb_config.perl > +GITWEB_CONFIG = ./gitweb_config.perl > GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf > GITWEB_CONFIG_COMMON = /etc/gitweb-common.conf > GITWEB_HOME_LINK_STR = projects > > but that does not help if someone overrides GITWEB_CONFIG, and besides, > it would be nicer to avoid the possibility of an @INC search altogether. > Another alternative would be to use > > local @INC = ('.'); > > Would that be better? > > Advice from someone more versed than I am in perl would be very welcome > (hence the cc to Ævar). It seems most sensible to follow the ./FILE.pl advice per https://metacpan.org/pod/distribution/perl/pod/perl5260delta.pod#Removal-of-the-current-directory-(%22.%22)-from-@INC Just: local @INC = '.'; do 'FILE.pl'; Would do the same thing, but seems like a more indirect way to do it if all we want is ./ anyway. FWIW to be pedantically bug-compatible with the old version (we should not do this) it's: local @INC = (@INC, "."); do 'FILE.pl'; I.e. before our behavior was implicitly to check whether we had a local FILE.pl, then loop through all of @INC to see if we found it there, and finally come back to the file we did the -e check for.
gitweb: local configuration not found
Martin Mares wrote[1]: > After upgrade to Stretch, gitweb no longer finds the configuration file > "gitweb_config.perl" in the current directory. However, "man gitweb" still > mentions this as one of the possible locations of the config file (and > indeed a useful one when using multiple instances of gitweb). > > It was probably broken by Perl dropping "." from the default search path > for security reasons. Indeed, perldelta(1) tells me that in 5.24.1 (and 5.26, etc), Core modules and tools no longer search "." for optional modules gitweb.perl contains sub read_config_file { my $filename = shift; return unless defined $filename; # die if there are errors parsing config file if (-e $filename) { do $filename; which implies an @INC search but it is silly --- as the "-e" test illustrates, this never intended to search @INC. Documentation says "If you are absolutely certain that you want your script to load and execute a file from the current directory, then use a ./ prefix". We can do that, like so: diff --git i/gitweb/Makefile w/gitweb/Makefile index cd194d057f..3160b6cc5d 100644 --- i/gitweb/Makefile +++ w/gitweb/Makefile @@ -18,7 +18,7 @@ RM ?= rm -f INSTALL ?= install # default configuration for gitweb -GITWEB_CONFIG = gitweb_config.perl +GITWEB_CONFIG = ./gitweb_config.perl GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf GITWEB_CONFIG_COMMON = /etc/gitweb-common.conf GITWEB_HOME_LINK_STR = projects but that does not help if someone overrides GITWEB_CONFIG, and besides, it would be nicer to avoid the possibility of an @INC search altogether. Another alternative would be to use local @INC = ('.'); Would that be better? Advice from someone more versed than I am in perl would be very welcome (hence the cc to Ævar). Thanks for reporting and hope that helps, Jonathan [1] https://bugs.debian.org/915632
[PATCH 28/78] config.txt: move gitweb.* to a separate file
Signed-off-by: Nguyễn Thái Ngọc Duy --- Documentation/config.txt| 17 + Documentation/config/gitweb.txt | 16 2 files changed, 17 insertions(+), 16 deletions(-) create mode 100644 Documentation/config/gitweb.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index 6f4a66b3f2..99ac8fc8ec 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -337,22 +337,7 @@ include::config/gc.txt[] include::config/gitcvs.txt[] -gitweb.category:: -gitweb.description:: -gitweb.owner:: -gitweb.url:: - See linkgit:gitweb[1] for description. - -gitweb.avatar:: -gitweb.blame:: -gitweb.grep:: -gitweb.highlight:: -gitweb.patches:: -gitweb.pickaxe:: -gitweb.remote_heads:: -gitweb.showSizes:: -gitweb.snapshot:: - See linkgit:gitweb.conf[5] for description. +include::config/gitweb.txt[] grep.lineNumber:: If set to true, enable `-n` option by default. diff --git a/Documentation/config/gitweb.txt b/Documentation/config/gitweb.txt new file mode 100644 index 00..1b51475108 --- /dev/null +++ b/Documentation/config/gitweb.txt @@ -0,0 +1,16 @@ +gitweb.category:: +gitweb.description:: +gitweb.owner:: +gitweb.url:: + See linkgit:gitweb[1] for description. + +gitweb.avatar:: +gitweb.blame:: +gitweb.grep:: +gitweb.highlight:: +gitweb.patches:: +gitweb.pickaxe:: +gitweb.remote_heads:: +gitweb.showSizes:: +gitweb.snapshot:: + See linkgit:gitweb.conf[5] for description. -- 2.19.1.647.g708186aaf9
[PATCH 22/59] config.txt: move gitweb.* to a separate file
Signed-off-by: Nguyễn Thái Ngọc Duy --- Documentation/config.txt| 17 + Documentation/gitweb-config.txt | 16 2 files changed, 17 insertions(+), 16 deletions(-) create mode 100644 Documentation/gitweb-config.txt diff --git a/Documentation/config.txt b/Documentation/config.txt index ebef3c867a..6898c9f567 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -337,22 +337,7 @@ include::gc-config.txt[] include::gitcvs-config.txt[] -gitweb.category:: -gitweb.description:: -gitweb.owner:: -gitweb.url:: - See linkgit:gitweb[1] for description. - -gitweb.avatar:: -gitweb.blame:: -gitweb.grep:: -gitweb.highlight:: -gitweb.patches:: -gitweb.pickaxe:: -gitweb.remote_heads:: -gitweb.showSizes:: -gitweb.snapshot:: - See linkgit:gitweb.conf[5] for description. +include::gitweb-config.txt[] grep.lineNumber:: If set to true, enable `-n` option by default. diff --git a/Documentation/gitweb-config.txt b/Documentation/gitweb-config.txt new file mode 100644 index 00..1b51475108 --- /dev/null +++ b/Documentation/gitweb-config.txt @@ -0,0 +1,16 @@ +gitweb.category:: +gitweb.description:: +gitweb.owner:: +gitweb.url:: + See linkgit:gitweb[1] for description. + +gitweb.avatar:: +gitweb.blame:: +gitweb.grep:: +gitweb.highlight:: +gitweb.patches:: +gitweb.pickaxe:: +gitweb.remote_heads:: +gitweb.showSizes:: +gitweb.snapshot:: + See linkgit:gitweb.conf[5] for description. -- 2.19.1.647.g708186aaf9
Re: gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi.
Le sam. 08 sept. 2018 à 19:15:32 +0200, Julien Moutinho a écrit : > As a quick workaround this hotpatch can even be put in $GITWEB_CONFIG > by removing the `local` before `*FCGI::Stream::PRINT`. Turns out to require more care than that, due to $per_request_config reloading $GITWEB_CONFIG at each request, overwriting FCGI::Stream::PRINT multiple times, messing the encoding. This seems to work(around): | if ($first_request) { | my $enc = Encode::find_encoding('UTF-8'); | my $org = \&FCGI::Stream::PRINT; | no warnings 'redefine'; | *FCGI::Stream::PRINT = sub { | my @OUTPUT = @_; | for (my $i = 1; $i < @_; $i++) { | $OUTPUT[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC); | } | @_ = @OUTPUT; | goto $org; | }; | }
gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi.
Description === An old (2011) description of the problem is here: https://stackoverflow.com/questions/7285215/nginx-fastcgi-utf-8-encoding-output-iso-8859-1-instead-of-utf8#answer-18149487 Basically, gitweb's HTML output is not always encoded in UTF-8 when using --fastcgi. Reproduction gitweb v2.18.0 perl v5.28.0 | echo Système >test.git/description According to the 2011 problem report, the problem only appears when using gitweb.cgi --fastcgi not when gitweb.cgi is spawned by fcgiwrap. And apparently, the text must not contain one character which cannot be correctly converted to ISO-8859-1, or an UTF-8 encoding is done (not sure by what); which made this bug harder to spot. Explanation === According to Christian Hansen (chansen), the problem is that: > FCGI streams are implemented using the older stream API, > TIEHANDLE. Applying PerlIO layers using binmode() has no effect. https://stackoverflow.com/questions/5005104/how-to-force-fastcgi-to-encode-form-data-as-utf-8-as-cgi-pm-has-option#answer-7097698 https://perldoc.perl.org/perltie.html Indeed: > FCGI.pm isn't Unicode aware, > only characters within the range 0x00-0xFF are supported. https://metacpan.org/pod/FCGI#LIMITATIONS But, as stated in gitweb's to_utf8(): > gitweb writes out in utf-8 > thanks to "binmode STDOUT, ':utf8'" at beginning" Correction == Christian Hansen suggested that: "The proper solution would be to encode your data before outputting it, but if thats not an option I can offer this hotpatch:" | my $enc = Encode::find_encoding('UTF-8'); | my $org = \&FCGI::Stream::PRINT; | no warnings 'redefine'; | local *FCGI::Stream::PRINT = sub { | my @OUTPUT = @_; | for (my $i = 1; $i < @_; $i++) { | $OUTPUT[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC); | } | @_ = @OUTPUT; | goto $org; | }; As a quick workaround this hotpatch can even be put in $GITWEB_CONFIG by removing the `local` before `*FCGI::Stream::PRINT`.
Re: gitweb and Levenshtein
Hi David, On Thu, 12 Jul 2018, David Brown wrote: > Howdy, I want to hack the getweb_make_index.perl script to create a string > search using: https://github.com/git/git/blob/master/levenshtein.c. > > How do i reference the compiled code? > > I would like to call this routine using Java and maybe Perl. The code to compute the Levenshtein distance is not exposed in Git's command-line interface. Therefore, there is currently no way to call it directly from Java or Perl. The best you could do for now is to convert the code to Java or Perl. Ciao, Johannes
gitweb and Levenshtein
Howdy, I want to hack the getweb_make_index.perl script to create a string search using: https://github.com/git/git/blob/master/levenshtein.c. How do i reference the compiled code? I would like to call this routine using Java and maybe Perl. Please advise. Thanks. Regards,
Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string
Jakub Narebski writes: > I think the problem is not with aligning, otherwise we would simply get > bad aling, and not visible corruption. The ACTUAL PROBLEM is most > probably because of concatenating strings marked as UTF-8 and strings > not marked as UTF-8. Strange things happen then in Perl, unfortunetaly. Sounds quite right---the patch needs to be retitled, if the bug is not about "measuring offset", which I think is what fooled me when I sent my response.
Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string
> One solution would be to force conversion to UTF-8 on input via "open" > pragma (e.g. "use open ':encoding(UTF-8)';"). But there is no > UTF-8-with_fallback encoding available - we would have to write one, and > install it as module (or fake it via Perl trickery). This mechanism is > almost the same to what we currently use in gitwbe. Yes, I tried using `Encode::Guess` with "open" pragma, but no luck. https://perldoc.perl.org/Encode/Guess.html I'm also afraid of "open" pragma does not work properly while using git_blame_common(). Let's say someone using non-ASCII characters in his/her name, committing non-UTF8 encoded characters. git-blame will combine them in the same line. Following is an example: $ git blame dummy | xxd : 3461 6464 3565 6331 2028 e585 90e5 b3b6 4add5ec1 (.. 0010: 20e6 96b0 2032 3031 382d 3035 2d30 3320 ... 2018-05-03 0020: 3232 3a34 383a 3432 202b 3039 3030 2031 22:48:42 +0900 1 0030: 2920 8367 8389 8343 0a ) .g...C. * e585 90e5 b3b6 20e6 96b0 : my name, encoded with UTF-8 * 8367 8389 8343 : "トライ" encoded with Shift_JIS It means I need to split each lines of git-blame output at the very beginning, then convert the first-half as UTF-8 and the second-half as Shift_JIS. Sincerely, -- Shin Kojima
Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string
Junio C Hamano writes: > Shin Kojima writes: > >> Offset positions should not be counted by byte length, but by actual >> character length. >> ... >> # escape tabs (convert tabs to spaces) >> sub untabify { >> -my $line = shift; >> +my $line = to_utf8(shift); >> >> while ((my $pos = index($line, "\t")) != -1) { >> if (my $count = (8 - ($pos % 8))) { > > Some codepaths in the resulting codeflow look even hackier than they > already are. For example, format_rem_add_lines_pair() calls > untabify() and then feeds its result to esc_html(). The first thing > done in esc_html() is to call to_utf8(). I know that to_utf8() > cheats and leaves the incoming string as-is if it is already UTF-8, > so this may be a safe conversion, but ideally we should be able to > say "function X takes non-UTF8 and works on it", "function Y takes > UTF8 and works on it", and "function Z takes non-UTF8 and gives UTF8 > data back" for each functions clearly, not "function W can take > either UTF8 or any other garbage and tries to return UTF8". The problem with handling encoding in sane way, that is encode it on input (to UTF-8), and decode on output (to plain text or HTML) is the $fallback_encoding. Gitweb assumes that everything uses UTF-8 encoding. If the source is not in UTF-8, but for example uses latin-1 encoding, then there we could stumble upon byte sequences which are not valid UTF-8. If that happens, then gitweb tries to convert it to UTF-8 using $fallback_encoding. If $fallback_encoding is single-byte encoding, like latin-1, where any byte sequence is valid, then that's all. If there is an error during conversion to UTF-8, then Unicode REPLACEMENT CHARACTER, code point U+FFFD, is used. But there are places where gitweb outputs plain text; the intention is to use source data as is - to have it as one would have in the console. Some input paths are common for plain text and HTML output; because of that problem the data is not converted to UTF-8 on input. The to_utf8() function tries to be clever, and do not convert alredy converted data. > Also, does it even "fix" the problem to use to_utf8() here in the > first place? Untabify is about aligning the character after a HT to > multiple of 8 display position, so we'd want measure display width, > which is not the same as either byte count or char count. I think the problem is not with aligning, otherwise we would simply get bad aling, and not visible corruption. The ACTUAL PROBLEM is most probably because of concatenating strings marked as UTF-8 and strings not marked as UTF-8. Strange things happen then in Perl, unfortunetaly. One solution would be to force conversion to UTF-8 on input via "open" pragma (e.g. "use open ':encoding(UTF-8)';"). But there is no UTF-8-with_fallback encoding available - we would have to write one, and install it as module (or fake it via Perl trickery). This mechanism is almost the same to what we currently use in gitwbe. Another would be to use the trick that Perl 6 uses when encountering byte sequence that is invalid UTF-8 - encode it using private plane in Unicode using UTF-8, thus achieving lossless conversion / decoding. But this also as far as I know is not available from CPAN, so we would have to implement it ourself. Best, -- Jakub Narębski
Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string
> ideally we should be able to say "function X takes non-UTF8 and > works on it", "function Y takes UTF8 and works on it", and "function > Z takes non-UTF8 and gives UTF8 data back" for each functions > clearly, not "function W can take either UTF8 or any other garbage > and tries to return UTF8". Yes, I totally agree with that. > Some codepaths in the resulting codeflow look even harder than they > already are. For example, format_rem_add_lines_pair() calls > untabify() and then feeds its result to esc_html(). Honestly, I discovered this problem exactly from format_rem_add_lines_pair(). In my environment($fallback_encoding = 'cp932'), some commitdiff shows garbled text only inside color-words portions. I added a reproduce process at the end of this message. After my investigation, I thought format_rem_add_lines_pair() tries to use split()/index()/substr()/etc against raw blob contents and that produces funny characters. These builtin functions should be used to decoded string. untabify() looks proper place for me to decode blob contents beforehand, as it definitely is not to be used for binary contests like images and compressed snapshot files. I'm sure using to_utf8() in untabify() fixes this problem. In fact, there is also another similar problem in blame function that assumes blob contents as if utf8 encoded: binmode $fd, ':utf8'; I personally consider text blob contents should be decoded as soon as possible and esc_html() should not contain to_utf8(), but the codeflow is slightly vast and I couldn't eliminate the possibility of calls from somewhere else that does not care character encodings. So yes, I would appreciate hearing your thoughts. > Also, does it even "fix" the problem to use to_utf8() here in the > first place? Untabify is about aligning the character after a HT to > multiple of 8 display position, so we'd want measure display width, > which is not the same as either byte count or char count. Following is a reproduce process: $ git --version git version 2.17.0 $ mkdir test $ cd test $ git init $ echo 'モバイル' | iconv -f UTF-8 -t Shift_JIS > dummy $ git add . $ git commit -m 'init' $ echo 'インスタント' | iconv -f UTF-8 -t Shift_JIS > dummy $ git commit -am 'change' $ git instaweb $ echo 'our $fallback_encoding = "cp932";' >> .git/gitweb/gitweb_config.perl $ w3m -dump 'http://127.0.0.1:1234/?p=.git;a=commitdiff' What I got: gitprojects / .git / commitdiff [commit ] ? search: [] [ ]re summary | shortlog | log | commit | commitdiff | tree raw | patch | inline | side by side (parent: 79e26fe) change master authorShin Kojima Wed, 2 May 2018 10:55:01 + (19:55 +0900) committer Shin Kojima Wed, 2 May 2018 10:55:01 + (19:55 +0900) dummy patch | blob | history diff --git a/dummy b/dummy index ac37f38..31fb96a 100644 (file) --- a/dummy +++ b/dummy @@ -1 +1 @@ -coイル +Cンスタント Unnamed repository; edit this file 'description' to name the repository. RSS Atom What I expected: gitprojects / .git / commitdiff [commit ] ? search: [] [ ]re summary | shortlog | log | commit | commitdiff | tree raw | patch | inline | side by side (parent: 79e26fe) change master authorShin Kojima Wed, 2 May 2018 10:55:01 + (19:55 +0900) committer Shin Kojima Wed, 2 May 2018 10:55:01 + (19:55 +0900) dummy patch | blob | history diff --git a/dummy b/dummy index ac37f38..31fb96a 100644 (file) --- a/dummy +++ b/dummy @@ -1 +1 @@ -モバイル +インスタント Unnamed repository; edit this file 'description' to name the repository. RSS Atom -- Shin Kojima
Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string
Shin Kojima writes: > Offset positions should not be counted by byte length, but by actual > character length. > ... > # escape tabs (convert tabs to spaces) > sub untabify { > - my $line = shift; > + my $line = to_utf8(shift); > > while ((my $pos = index($line, "\t")) != -1) { > if (my $count = (8 - ($pos % 8))) { Some codepaths in the resulting codeflow look even hackier than they already are. For example, format_rem_add_lines_pair() calls untabify() and then feeds its result to esc_html(). The first thing done in esc_html() is to call to_utf8(). I know that to_utf8() cheats and leaves the incoming string as-is if it is already UTF-8, so this may be a safe conversion, but ideally we should be able to say "function X takes non-UTF8 and works on it", "function Y takes UTF8 and works on it", and "function Z takes non-UTF8 and gives UTF8 data back" for each functions clearly, not "function W can take either UTF8 or any other garbage and tries to return UTF8". Also, does it even "fix" the problem to use to_utf8() here in the first place? Untabify is about aligning the character after a HT to multiple of 8 display position, so we'd want measure display width, which is not the same as either byte count or char count.
[PATCH] gitweb: Measure offsets against UTF-8 flagged string
Offset positions should not be counted by byte length, but by actual character length. >5183 # We need to untabify lines before split()'ing them; >5184 # otherwise offsets would be invalid. Horizontal tab is not the only case we need to consider. Please excuse me for using your name here, but the following URL can not find "match" occurances while using `git-instaweb` on the git repository. http://127.0.0.1:1234/?p=.git&a=search&h=HEAD&st=grep&s=Nar%C4%99bski Signed-off-by: Shin Kojima --- gitweb/gitweb.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2594a4bad..a5a9093a1 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -1697,7 +1697,7 @@ sub unquote { # escape tabs (convert tabs to spaces) sub untabify { - my $line = shift; + my $line = to_utf8(shift); while ((my $pos = index($line, "\t")) != -1) { if (my $count = (8 - ($pos % 8))) { -- 2.17.0
[ANNOUNCE] Gitwin: Git Server for Windows with SSH/HTTP(S) transport and Gitweb
Hi all, This is a ONE-TIME announcement of Gitwin - a Git Server for Windows: Gitwin is a packaging of Git, OpenSSH, Nginx and many other related tools to make it a ready-to-use solution as a secure git repository on Windows. It supports SSH and HTTP(S) transports as well as Gitweb with Highlighter and Gravatar support. A free edition is available. Please visit https://itefix.net/gitwin for more information Kind regards Tevfik Karagulle Itefix.net
Re: Socket activation for GitWeb FastCGI with systemd?
03.04.2018, 23:04, "Jacob Keller" : > On Tue, Apr 3, 2018 at 11:53 AM, Alex Ivanov wrote: >> Hi. >> I want to use systemd as fastcgi spawner for gitweb + nginx. >> The traffic is low and number of users is limited + traversal bots. For >> that reason I've decided to use following mimimal services >> >> gitweb.socket >> [Unit] >> Description=GitWeb Socket >> >> [Socket] >> ListenStream=/run/gitweb.sock >> Accept=false >> >> [Install] >> WantedBy=sockets.target >> >> gitweb.service >> [Unit] >> Description=GitWeb Service >> >> [Service] >> Type=simple >> ExecStart=/path/to/gitweb.cgi --fcgi >> StandardInput=socket >> >> However this scheme is not resistant to simple DDOS. >> E.g. traversal bots often kill the service by opening non existing path >> (e.g http://host/?p=repo;a=blob;f=nonexisting/path;hb=HEAD showing in >> browser 404 - Cannot find file) many times consecutively, which leads to >> Apr 03 21:32:10 host systemd[1]: gitweb.service: Start request repeated too >> quickly. >> Apr 03 21:32:10 host systemd[1]: gitweb.service: Failed with result >> 'start-limit-hit'. >> Apr 03 21:32:10 host systemd[1]: Failed to start GitWeb service. >> and 502 Bad Gateway in browser. I believe the reason is that gitweb.service >> dies on failure and if it happens too often, systemd declines to restart the >> service due to start limit hit. >> So my question is how to correct systemd services for GitWeb to be >> resistant to such issue? I prefer to use single process to process all >> clients. >> Thanks. > > This sounds like a systemd specific question that might get a better > answer from the systemd mailing list. Thanks I will try that too. > > That being said, I believe if in this case gitweb is dying due to the > path not existing? You might be able to configure systemd to > understand that the particular exit code for when the path doesn't > exist is a "valid" exit, and not a failure case.. I will try to do that, but I'm afraid that there may be other ways to remotely abuse the service. > > I'm not entirely understanding your goal.. you want each request to > launch the gitweb process, and when it's done you want it to exit? But > if there are multiple connections at once you want it to stay alive > until it services them all? I think the best answer is configure > systemd to understand that the exit code for when the path is invalid > will be counted as a success. I want a single process for all connections too keep RAM usage at minimal. I also though it fits my case since number of users is low. > > Thanks, > Jake
Re: Socket activation for GitWeb FastCGI with systemd?
On Tue, Apr 3, 2018 at 11:53 AM, Alex Ivanov wrote: > Hi. > I want to use systemd as fastcgi spawner for gitweb + nginx. > The traffic is low and number of users is limited + traversal bots. For that > reason I've decided to use following mimimal services > > gitweb.socket > [Unit] > Description=GitWeb Socket > > [Socket] > ListenStream=/run/gitweb.sock > Accept=false > > [Install] > WantedBy=sockets.target > > gitweb.service > [Unit] > Description=GitWeb Service > > [Service] > Type=simple > ExecStart=/path/to/gitweb.cgi --fcgi > StandardInput=socket > > However this scheme is not resistant to simple DDOS. > E.g. traversal bots often kill the service by opening non existing path (e.g > http://host/?p=repo;a=blob;f=nonexisting/path;hb=HEAD showing in browser 404 > - Cannot find file) many times consecutively, which leads to > Apr 03 21:32:10 host systemd[1]: gitweb.service: Start request repeated too > quickly. > Apr 03 21:32:10 host systemd[1]: gitweb.service: Failed with result > 'start-limit-hit'. > Apr 03 21:32:10 host systemd[1]: Failed to start GitWeb service. > and 502 Bad Gateway in browser. I believe the reason is that gitweb.service > dies on failure and if it happens too often, systemd declines to restart the > service due to start limit hit. > So my question is how to correct systemd services for GitWeb to be resistant > to such issue? I prefer to use single process to process all clients. > Thanks. This sounds like a systemd specific question that might get a better answer from the systemd mailing list. That being said, I believe if in this case gitweb is dying due to the path not existing? You might be able to configure systemd to understand that the particular exit code for when the path doesn't exist is a "valid" exit, and not a failure case.. I'm not entirely understanding your goal.. you want each request to launch the gitweb process, and when it's done you want it to exit? But if there are multiple connections at once you want it to stay alive until it services them all? I think the best answer is configure systemd to understand that the exit code for when the path is invalid will be counted as a success. Thanks, Jake
Socket activation for GitWeb FastCGI with systemd?
Hi. I want to use systemd as fastcgi spawner for gitweb + nginx. The traffic is low and number of users is limited + traversal bots. For that reason I've decided to use following mimimal services gitweb.socket [Unit] Description=GitWeb Socket [Socket] ListenStream=/run/gitweb.sock Accept=false [Install] WantedBy=sockets.target gitweb.service [Unit] Description=GitWeb Service [Service] Type=simple ExecStart=/path/to/gitweb.cgi --fcgi StandardInput=socket However this scheme is not resistant to simple DDOS. E.g. traversal bots often kill the service by opening non existing path (e.g http://host/?p=repo;a=blob;f=nonexisting/path;hb=HEAD showing in browser 404 - Cannot find file) many times consecutively, which leads to Apr 03 21:32:10 host systemd[1]: gitweb.service: Start request repeated too quickly. Apr 03 21:32:10 host systemd[1]: gitweb.service: Failed with result 'start-limit-hit'. Apr 03 21:32:10 host systemd[1]: Failed to start GitWeb service. and 502 Bad Gateway in browser. I believe the reason is that gitweb.service dies on failure and if it happens too often, systemd declines to restart the service due to start limit hit. So my question is how to correct systemd services for GitWeb to be resistant to such issue? I prefer to use single process to process all clients. Thanks.
[PATCH v3 04/13] gitweb: hard-depend on the Digest::MD5 5.8 module
Since my d48b284183 ("perl: bump the required Perl version to 5.8 from 5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to conditionally require Digest::MD5 anymore. It was released with perl v5.7.3[1] The initial introduction of the dependency in e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much, this also undoes part of the later 2e9c8789b7 ("gitweb: Mention optional Perl modules in INSTALL", 2011-02-04) since gitweb will always be run on at least 5.8, so there's no need to mention Digest::MD5 as a required module in the documentation, let's instead say that we require perl 5.8. 1. $ corelist Digest::MD5 Data for 2015-02-14 Digest::MD5 was first released with perl v5.7.3 Signed-off-by: Ævar Arnfjörð Bjarmason --- gitweb/INSTALL | 3 +-- gitweb/gitweb.perl | 17 - 2 files changed, 5 insertions(+), 15 deletions(-) diff --git a/gitweb/INSTALL b/gitweb/INSTALL index 408f2859d3..a58e6b3c44 100644 --- a/gitweb/INSTALL +++ b/gitweb/INSTALL @@ -29,12 +29,11 @@ Requirements - Core git tools - - Perl + - Perl 5.8 - Perl modules: CGI, Encode, Fcntl, File::Find, File::Basename. - web server The following optional Perl modules are required for extra features - - Digest::MD5 - for gravatar support - CGI::Fast and FCGI - for running gitweb as FastCGI script - HTML::TagCloud - for fancy tag cloud in project list view - HTTP::Date or Time::ParseDate - to support If-Modified-Since for feeds diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2417057f2b..2594a4badb 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -20,6 +20,8 @@ use Fcntl ':mode'; use File::Find qw(); use File::Basename qw(basename); use Time::HiRes qw(gettimeofday tv_interval); +use Digest::MD5 qw(md5_hex); + binmode STDOUT, ':utf8'; if (!defined($CGI::VERSION) || $CGI::VERSION < 4.08) { @@ -490,7 +492,6 @@ our %feature = ( # Currently available providers are gravatar and picon. # If an unknown provider is specified, the feature is disabled. - # Gravatar depends on Digest::MD5. # Picon currently relies on the indiana.edu database. # To enable system wide have in $GITWEB_CONFIG @@ -1166,18 +1167,8 @@ sub configure_gitweb_features { our @snapshot_fmts = gitweb_get_feature('snapshot'); @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); - # check that the avatar feature is set to a known provider name, - # and for each provider check if the dependencies are satisfied. - # if the provider name is invalid or the dependencies are not met, - # reset $git_avatar to the empty string. our ($git_avatar) = gitweb_get_feature('avatar'); - if ($git_avatar eq 'gravatar') { - $git_avatar = '' unless (eval { require Digest::MD5; 1; }); - } elsif ($git_avatar eq 'picon') { - # no dependencies - } else { - $git_avatar = ''; - } + $git_avatar = '' unless $git_avatar =~ /^(?:gravatar|picon)$/s; our @extra_branch_refs = gitweb_get_feature('extra-branch-refs'); @extra_branch_refs = filter_and_validate_refs (@extra_branch_refs); @@ -2167,7 +2158,7 @@ sub gravatar_url { my $size = shift; $avatar_cache{$email} ||= "//www.gravatar.com/avatar/" . - Digest::MD5::md5_hex($email) . "?s="; + md5_hex($email) . "?s="; return $avatar_cache{$email} . $size; } -- 2.15.1.424.g9478a66081
Re: [PATCH v2 04/13] gitweb: hard-depend on the Digest::MD5 5.8 module
On Sun, Feb 25, 2018 at 2:46 PM, Ævar Arnfjörð Bjarmason wrote: > Since my d48b284183 ("perl: bump the required Perl version to 5.8 from > 5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to > conditionally require Digest::MD5 anymore. It was released with perl > v5.7.3[1] > > The initial introduction of the dependency in > e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much, > this also undoes part of the later 2e9c8789b7 ("gitweb: Mention > optional Perl modules in INSTALL", 2011-02-04) since gitweb will > always be run on at least 5.8, so there's no need to mention > Digest::MD5 as a required module in the documentation, let's instead > say that we require perl 5.8. > > As with the preceding Net::{SMTP,Domain} change we now unconditionally > "use" the module instead. This is patch 4/13 but the Net::{SMTP,Domain} change doesn't come until 6/13, so saying "preceding" is confusing. > 1. $ corelist Digest::MD5 >Data for 2015-02-14 >Digest::MD5 was first released with perl v5.7.3 > > Signed-off-by: Ævar Arnfjörð Bjarmason
[PATCH v2 04/13] gitweb: hard-depend on the Digest::MD5 5.8 module
Since my d48b284183 ("perl: bump the required Perl version to 5.8 from 5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to conditionally require Digest::MD5 anymore. It was released with perl v5.7.3[1] The initial introduction of the dependency in e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much, this also undoes part of the later 2e9c8789b7 ("gitweb: Mention optional Perl modules in INSTALL", 2011-02-04) since gitweb will always be run on at least 5.8, so there's no need to mention Digest::MD5 as a required module in the documentation, let's instead say that we require perl 5.8. As with the preceding Net::{SMTP,Domain} change we now unconditionally "use" the module instead. 1. $ corelist Digest::MD5 Data for 2015-02-14 Digest::MD5 was first released with perl v5.7.3 Signed-off-by: Ævar Arnfjörð Bjarmason --- gitweb/INSTALL | 3 +-- gitweb/gitweb.perl | 17 - 2 files changed, 5 insertions(+), 15 deletions(-) diff --git a/gitweb/INSTALL b/gitweb/INSTALL index 408f2859d3..a58e6b3c44 100644 --- a/gitweb/INSTALL +++ b/gitweb/INSTALL @@ -29,12 +29,11 @@ Requirements - Core git tools - - Perl + - Perl 5.8 - Perl modules: CGI, Encode, Fcntl, File::Find, File::Basename. - web server The following optional Perl modules are required for extra features - - Digest::MD5 - for gravatar support - CGI::Fast and FCGI - for running gitweb as FastCGI script - HTML::TagCloud - for fancy tag cloud in project list view - HTTP::Date or Time::ParseDate - to support If-Modified-Since for feeds diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2417057f2b..2594a4badb 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -20,6 +20,8 @@ use Fcntl ':mode'; use File::Find qw(); use File::Basename qw(basename); use Time::HiRes qw(gettimeofday tv_interval); +use Digest::MD5 qw(md5_hex); + binmode STDOUT, ':utf8'; if (!defined($CGI::VERSION) || $CGI::VERSION < 4.08) { @@ -490,7 +492,6 @@ our %feature = ( # Currently available providers are gravatar and picon. # If an unknown provider is specified, the feature is disabled. - # Gravatar depends on Digest::MD5. # Picon currently relies on the indiana.edu database. # To enable system wide have in $GITWEB_CONFIG @@ -1166,18 +1167,8 @@ sub configure_gitweb_features { our @snapshot_fmts = gitweb_get_feature('snapshot'); @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); - # check that the avatar feature is set to a known provider name, - # and for each provider check if the dependencies are satisfied. - # if the provider name is invalid or the dependencies are not met, - # reset $git_avatar to the empty string. our ($git_avatar) = gitweb_get_feature('avatar'); - if ($git_avatar eq 'gravatar') { - $git_avatar = '' unless (eval { require Digest::MD5; 1; }); - } elsif ($git_avatar eq 'picon') { - # no dependencies - } else { - $git_avatar = ''; - } + $git_avatar = '' unless $git_avatar =~ /^(?:gravatar|picon)$/s; our @extra_branch_refs = gitweb_get_feature('extra-branch-refs'); @extra_branch_refs = filter_and_validate_refs (@extra_branch_refs); @@ -2167,7 +2158,7 @@ sub gravatar_url { my $size = shift; $avatar_cache{$email} ||= "//www.gravatar.com/avatar/" . - Digest::MD5::md5_hex($email) . "?s="; + md5_hex($email) . "?s="; return $avatar_cache{$email} . $size; } -- 2.15.1.424.g9478a66081
Re: [PATCH 7/8] gitweb: hard-depend on the Digest::MD5 5.8 module
Ævar Arnfjörð Bjarmason wrote: > Signed-off-by: Ævar Arnfjörð Bjarmason > --- > gitweb/INSTALL | 3 +-- > gitweb/gitweb.perl | 17 + > 2 files changed, 6 insertions(+), 14 deletions(-) Makes sense, and I like the diffstat. [...] > +++ b/gitweb/gitweb.perl [...] > @@ -1166,18 +1165,11 @@ sub configure_gitweb_features { > our @snapshot_fmts = gitweb_get_feature('snapshot'); > @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); > > - # check that the avatar feature is set to a known provider name, > - # and for each provider check if the dependencies are satisfied. > - # if the provider name is invalid or the dependencies are not met, > - # reset $git_avatar to the empty string. > + # check that the avatar feature is set to a known provider > + # name, if the provider name is invalid, reset $git_avatar to > + # the empty string. Comma splice. Formatting as sentences would make this easier to read, anyway: # Check that the avatar feature is set to a known provider name. # If the provider name is invalid, reset $git_avatar to the empty # string. Even better would be to remove the comment altogether. The code is clear enough on its own and the comment should not be needed now that it is a one-liner. [...] > @@ -2165,6 +2157,7 @@ sub picon_url { > sub gravatar_url { > my $email = lc shift; > my $size = shift; > + require Digest::MD5; Same question as the previous patch: how do we decide whether to 'use' or to 'require' in cases like this? Thanks, Jonathan
[PATCH 7/8] gitweb: hard-depend on the Digest::MD5 5.8 module
Since my d48b284183 ("perl: bump the required Perl version to 5.8 from 5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to conditionally require Digest::MD5 anymore. It was released with perl v5.7.3. The initial introduction of the dependency in e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much, this also undoes part of the later 2e9c8789b7 ("gitweb: Mention optional Perl modules in INSTALL", 2011-02-04) since gitweb will always be run on at least 5.8, so there's no need to mention Digest::MD5 as a required module in the documentation, let's instead say that we require perl 5.8. Signed-off-by: Ævar Arnfjörð Bjarmason --- gitweb/INSTALL | 3 +-- gitweb/gitweb.perl | 17 + 2 files changed, 6 insertions(+), 14 deletions(-) diff --git a/gitweb/INSTALL b/gitweb/INSTALL index 408f2859d3..a58e6b3c44 100644 --- a/gitweb/INSTALL +++ b/gitweb/INSTALL @@ -29,12 +29,11 @@ Requirements - Core git tools - - Perl + - Perl 5.8 - Perl modules: CGI, Encode, Fcntl, File::Find, File::Basename. - web server The following optional Perl modules are required for extra features - - Digest::MD5 - for gravatar support - CGI::Fast and FCGI - for running gitweb as FastCGI script - HTML::TagCloud - for fancy tag cloud in project list view - HTTP::Date or Time::ParseDate - to support If-Modified-Since for feeds diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2417057f2b..8f48e3c02e 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -490,7 +490,6 @@ our %feature = ( # Currently available providers are gravatar and picon. # If an unknown provider is specified, the feature is disabled. - # Gravatar depends on Digest::MD5. # Picon currently relies on the indiana.edu database. # To enable system wide have in $GITWEB_CONFIG @@ -1166,18 +1165,11 @@ sub configure_gitweb_features { our @snapshot_fmts = gitweb_get_feature('snapshot'); @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); - # check that the avatar feature is set to a known provider name, - # and for each provider check if the dependencies are satisfied. - # if the provider name is invalid or the dependencies are not met, - # reset $git_avatar to the empty string. + # check that the avatar feature is set to a known provider + # name, if the provider name is invalid, reset $git_avatar to + # the empty string. our ($git_avatar) = gitweb_get_feature('avatar'); - if ($git_avatar eq 'gravatar') { - $git_avatar = '' unless (eval { require Digest::MD5; 1; }); - } elsif ($git_avatar eq 'picon') { - # no dependencies - } else { - $git_avatar = ''; - } + $git_avatar = '' unless $git_avatar =~ /^(?:gravatar|picon)$/s; our @extra_branch_refs = gitweb_get_feature('extra-branch-refs'); @extra_branch_refs = filter_and_validate_refs (@extra_branch_refs); @@ -2165,6 +2157,7 @@ sub picon_url { sub gravatar_url { my $email = lc shift; my $size = shift; + require Digest::MD5; $avatar_cache{$email} ||= "//www.gravatar.com/avatar/" . Digest::MD5::md5_hex($email) . "?s="; -- 2.15.1.424.g9478a66081
Re: [PATCH] gitweb bugfix - check for search permission on sub-directories while scanning project root to prevent program termination
Hielke Christian Braun writes: > Hi, > > gitweb terminates and shows no project list, if it can not access a > sub-directory in the project root directory. It should show a list of > the projects it can access. Patch corrects this by skipping inaccessible > directories. > > > Signed-off-by: Hielke Christian Braun > > > --- > gitweb/gitweb.perl | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 3d4a8ee27c96a..9208f42ed1753 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3071,6 +3071,8 @@ sub git_get_projects_list { > return if (m!^[/.]$!); > # only directories can be git repositories > return unless (-d $_); > + # need search permission > + return unless (-x $_); > # don't traverse too deep (Find is super slow > on os x) > # $project_maxdepth excludes depth of > $projectroot > if (($File::Find::name =~ tr!/!!) - $pfxdepth > > $project_maxdepth) { > > -- > https://github.com/git/git/pull/384 I've tweaked the title and the log message further before queuing the patch. Thanks.
[PATCH] gitweb bugfix - check for search permission on sub-directories while scanning project root to prevent program termination
Hi, gitweb terminates and shows no project list, if it can not access a sub-directory in the project root directory. It should show a list of the projects it can access. Patch corrects this by skipping inaccessible directories. Signed-off-by: Hielke Christian Braun --- gitweb/gitweb.perl | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 3d4a8ee27c96a..9208f42ed1753 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3071,6 +3071,8 @@ sub git_get_projects_list { return if (m!^[/.]$!); # only directories can be git repositories return unless (-d $_); + # need search permission + return unless (-x $_); # don't traverse too deep (Find is super slow on os x) # $project_maxdepth excludes depth of $projectroot if (($File::Find::name =~ tr!/!!) - $pfxdepth > $project_maxdepth) { -- https://github.com/git/git/pull/384 On 2017-07-17 22:18, Junio C Hamano wrote: > Hielke Christian Braun writes: > >> --- >> gitweb/gitweb.perl | 2 ++ >> 1 file changed, 2 insertions(+) > > Thanks for trying to help and welcome to Git development community. > But > > (1) Please double-check the title of your change. Imagine that the > title appears in a list of 600 other commits that goes in a > single release in "git shortlog --no-merges" output. Does it > tell readers of the list what the change is about? We cannot > even guess that it is about the project list that appears in > gitweb output. > > (2) Please explain what problem this is trying to solve; that is > what the blank space before "---" line we see up above is for. > What happens in the current code under what condition, until we > do not apply this patch, and why is it a bad thing to happen? > Once we apply this patch, in what way the situation gets > improved? > > (3) Please sign-off your patch (see SubmittingPatches in > Documentation). > > Thanks. > >> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index 3d4a8ee27c96a..9208f42ed1753 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -3071,6 +3071,8 @@ sub git_get_projects_list { >> return if (m!^[/.]$!); >> # only directories can be git repositories >> return unless (-d $_); >> +# need search permission >> +return unless (-x $_); >> # don't traverse too deep (Find is super slow >> on os x) >> # $project_maxdepth excludes depth of >> $projectroot >> if (($File::Find::name =~ tr!/!!) - $pfxdepth > >> $project_maxdepth) { >> >> -- >> https://github.com/git/git/pull/384
Re: [PATCH 0/2] gitweb: tags feeds
Ping? On Wed, Apr 19, 2017 at 8:49 AM, Giuseppe Bilotta wrote: > A smallish patchset to implement RSS and Atom feeds to complement the > tags view, accessible as verbs `tags_rss` and `tags_atom`. > > (I actually made this some 5 years ago, and it has been running on > http://git.oblomov.eu/ since, but for some reason I forgot to submit > it for upstreaming.) > > The patchset is also available in the git repository at: > > git://git.oblomov.eu/git gitweb-tags-feed > > ---- > > Giuseppe Bilotta (2): > gitweb: infrastructure for tags feed > gitweb: expose tags feed in appropriate places > > gitweb/gitweb.perl | 126 > ++--- > 1 file changed, 91 insertions(+), 35 deletions(-) > > -- > 2.12.2.822.g5451c77231 > -- Giuseppe "Oblomov" Bilotta
[PATCH 1/2] gitweb: infrastructure for tags feed
Signed-off-by: Giuseppe Bilotta --- gitweb/gitweb.perl | 79 +++--- 1 file changed, 58 insertions(+), 21 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 7cf68f07b7..4adea84006 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3807,6 +3807,7 @@ sub git_get_tags_list { if ($type eq "tag" || $type eq "commit") { $ref_item{'epoch'} = $epoch; + $ref_item{'tz'} = $tz; if ($epoch) { $ref_item{'age'} = age_string(time - $ref_item{'epoch'}); } else { @@ -8132,6 +8133,10 @@ sub git_shortlog { sub git_feed { my $format = shift || 'atom'; + + # feed context: log, tags + my $ctx = shift || 'log'; + my $have_blame = gitweb_check_feature('blame'); # Atom: http://www.atomenabled.org/developers/syndication/ @@ -8140,9 +8145,19 @@ sub git_feed { die_error(400, "Unknown web feed format"); } + if ($ctx ne 'log' && $ctx ne 'tags') { + die_error(400, "Unknown web feed context"); + } + my $tags = $ctx eq 'tags' ? 1 : 0; + # log/feed of current (HEAD) branch, log of given branch, history of file/directory my $head = $hash || 'HEAD'; - my @commitlist = parse_commits($head, 150, 0, $file_name); + my @commitlist; + if ($tags) { + @commitlist = git_get_tags_list(15); + } else { + @commitlist = parse_commits($head, 150, 0, $file_name); + } my %latest_commit; my %latest_date; @@ -8154,9 +8169,12 @@ sub git_feed { } if (defined($commitlist[0])) { %latest_commit = %{$commitlist[0]}; - my $latest_epoch = $latest_commit{'committer_epoch'}; + my $latest_epoch = $tags ? $latest_commit{'epoch'} : + $latest_commit{'committer_epoch'}; exit_if_unmodified_since($latest_epoch); - %latest_date = parse_date($latest_epoch, $latest_commit{'committer_tz'}); + %latest_date = parse_date($latest_epoch, + $tags ? $latest_commit{'tz'} : + $latest_commit{'committer_tz'}); } print $cgi->header( -type => $content_type, @@ -8171,7 +8189,9 @@ sub git_feed { # header variables my $title = "$site_name - $project/$action"; my $feed_type = 'log'; - if (defined $hash) { + if ($tags) { + $feed_type = 'tags'; + } elsif (defined $hash) { $title .= " - '$hash'"; $feed_type = 'branch log'; if (defined $file_name) { @@ -8189,6 +8209,7 @@ sub git_feed { $descr = esc_html($descr); } else { $descr = "$project " . +($tags ? 'tags ' : '') . ($format eq 'rss' ? 'RSS' : 'Atom') . " feed"; } @@ -8197,7 +8218,9 @@ sub git_feed { #header my $alt_url; - if (defined $file_name) { + if ($tags) { + $alt_url = href(-full=>1, action=>"tags"); + } elsif (defined $file_name) { $alt_url = href(-full=>1, action=>"history", hash=>$hash, file_name=>$file_name); } elsif (defined $hash) { $alt_url = href(-full=>1, action=>"log", hash=>$hash); @@ -8261,9 +8284,15 @@ XML } # contents + my $co_action = $tags ? 'tag' : 'commitdiff'; for (my $i = 0; $i <= $#commitlist; $i++) { + my %clco; # commit info from commitlist, only used for tags my %co = %{$commitlist[$i]}; my $commit = $co{'id'}; + if ($tags) { + %clco = %co; + %co = parse_tag($commit); + } # we read 150, we always show 30 and the ones more recent than 48 hours if (($i >= 20) && ((time - $co{'author_epoch'}) > 48*60*60)) { last; @@ -8271,44 +8300,52 @@ XML my %cd = parse_date($co{'author_epoch'}, $co{'author_tz'}); # get list of changed files - open my $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts, - $c
[PATCH 2/2] gitweb: expose tags feed in appropriate places
Signed-off-by: Giuseppe Bilotta --- gitweb/gitweb.perl | 47 +-- 1 file changed, 33 insertions(+), 14 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 4adea84006..8be7444988 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -851,6 +851,8 @@ our %actions = ( "summary" => \&git_summary, "tag" => \&git_tag, "tags" => \&git_tags, + "tags_rss" => \&git_tags_rss, + "tags_atom" => \&git_tags_atom, "tree" => \&git_tree, "snapshot" => \&git_snapshot, "object" => \&git_object, @@ -2599,7 +2601,7 @@ sub get_feed_info { return unless (defined $project); # some views should link to OPML, or to generic project feed, # or don't have specific feed yet (so they should use generic) - return if (!$action || $action =~ /^(?:tags|heads|forks|tag|search)$/x); + return if (!$action || $action =~ /^(?:heads|forks|search)$/x); my $branch = undef; # branches refs uses 'refs/' + $get_branch_refs()[x] + '/' prefix @@ -2614,8 +2616,10 @@ sub get_feed_info { } } # find log type for feed description (title) - my $type = 'log'; - if (defined $file_name) { + my $type = "log"; + if ($action eq 'tag' || $action eq 'tags') { + $type = "tags"; + } elsif (defined $file_name) { $type = "history of $file_name"; $type .= "/" if ($action eq 'tree'); $type .= " on '$branch'" if (defined $branch); @@ -4007,6 +4011,7 @@ sub print_feed_meta { $href_params{'-title'} = 'log'; } + my $tag_view = $href_params{-title} eq 'tags'; foreach my $format (qw(RSS Atom)) { my $type = lc($format); my %link_attr = ( @@ -4016,7 +4021,7 @@ sub print_feed_meta { ); $href_params{'extra_options'} = undef; - $href_params{'action'} = $type; + $href_params{'action'} = ($tag_view ? 'tags_' : '') . $type; $link_attr{'-href'} = href(%href_params); print "\n"; - $href_params{'extra_options'} = '--no-merges'; - $link_attr{'-href'} = href(%href_params); - $link_attr{'-title'} .= ' (no merges)'; - print "\n"; + unless ($tag_view) { + $href_params{'extra_options'} = '--no-merges'; + $link_attr{'-href'} = href(%href_params); + $link_attr{'-title'} .= ' (no merges)'; + print "\n"; + } } } else { @@ -4217,8 +4224,9 @@ sub git_footer_html { } $href_params{'-title'} ||= 'log'; + my $tag_view = $href_params{-title} eq 'tags'; foreach my $format (qw(RSS Atom)) { - $href_params{'action'} = lc($format); + $href_params{'action'} = ($tag_view ? 'tags_' : '') . lc($format); print $cgi->a({-href => href(%href_params), -title => "$href_params{'-title'} $format feed", -class => $feed_class}, $format)."\n"; @@ -8409,10 +8417,18 @@ sub git_rss { git_feed('rss'); } +sub git_tags_rss { + git_feed('rss', 'tags') +} + sub git_atom { git_feed('atom'); } +sub git_tags_atom { + git_feed('atom', 'tags') +} + sub git_opml { my @list = git_get_projects_list($project_filter, $strict_export); if (!@list) { @@ -8457,6 +8473,9 @@ XML my $rss = href('project' => $proj{'path'}, 'action' => 'rss', -full => 1); my $html = href('project' => $proj{'path'}, 'action' => 'summary', -full => 1); print "\n"; + # and now the tags rss feed + $rss = href('project' => $proj{'path'}, 'action' => 'tags_rss', -full => 1); + print "\n"; } print < -- 2.12.2.822.g5451c77231
[PATCH 0/2] gitweb: tags feeds
A smallish patchset to implement RSS and Atom feeds to complement the tags view, accessible as verbs `tags_rss` and `tags_atom`. (I actually made this some 5 years ago, and it has been running on http://git.oblomov.eu/ since, but for some reason I forgot to submit it for upstreaming.) The patchset is also available in the git repository at: git://git.oblomov.eu/git gitweb-tags-feed Giuseppe Bilotta (2): gitweb: infrastructure for tags feed gitweb: expose tags feed in appropriate places gitweb/gitweb.perl | 126 ++--- 1 file changed, 91 insertions(+), 35 deletions(-) -- 2.12.2.822.g5451c77231
[PATCH v2 0/2] Minor changes to skip gitweb tests without Time::HiRes
This was originally just one small patch, but Jakub Narębski pointed out that calling the module "unusable" when we failed to load it was confusing, so now the start of this series is just a rephrasing of an existing error message I copied. Ævar Arnfjörð Bjarmason (2): gitweb tests: Change confusing "skip_all" phrasing gitweb tests: Skip tests when we don't have Time::HiRes t/gitweb-lib.sh | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) -- 2.11.0
Re: [PATCH v2 0/2] Minor changes to skip gitweb tests without Time::HiRes
Thanks, will replace what has been on 'pu'. Note that I'd dropped a double-SP from the latter one while queuing.
[PATCH v2 1/2] gitweb tests: Change confusing "skip_all" phrasing
Change the phrasing so that instead of saying that the CGI module is unusable, we say that it's not available. This came up on the git mailing list in <4b34e3a0-3da7-d821-2a7f-9a420ac1d...@gmail.com> from Jakub Narębski. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/gitweb-lib.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh index d5dab5a94f..59ef15efbd 100644 --- a/t/gitweb-lib.sh +++ b/t/gitweb-lib.sh @@ -110,7 +110,7 @@ perl -MEncode -e '$e="";decode_utf8($e, Encode::FB_CROAK)' >/dev/null 2>&1 || { } perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 || { - skip_all='skipping gitweb tests, CGI module unusable' + skip_all='skipping gitweb tests, CGI & CGI::Util & CGI::Carp modules not available' test_done } -- 2.11.0
[PATCH v2 2/2] gitweb tests: Skip tests when we don't have Time::HiRes
Change the gitweb tests to skip when we can't load the Time::HiRes module. Gitweb needs this module to work. It has been in perl core since v5.8, which is the oldest version we support. However CentOS (and perhaps some other distributions) carve it into its own non-core-perl package that's not installed along with /usr/bin/perl by default. Without this we'll hard fail the gitweb tests when trying to load the module. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/gitweb-lib.sh | 5 + 1 file changed, 5 insertions(+) diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh index 59ef15efbd..b7a73874e7 100644 --- a/t/gitweb-lib.sh +++ b/t/gitweb-lib.sh @@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 || { test_done } +perl -mTime::HiRes -e 0 >/dev/null 2>&1 || { + skip_all='skipping gitweb tests, Time::HiRes module not available' + test_done +} + gitweb_init -- 2.11.0
Re: [PATCH] gitweb tests: Skip tests when we don't have Time::HiRes
On Mon, Feb 27, 2017 at 6:48 PM, Jakub Narębski wrote: > W dniu 27.02.2017 o 13:37, Ævar Arnfjörð Bjarmason pisze: >> Change the gitweb tests to skip when we can't load the Time::HiRes >> module. > > Could you tell us in the commit message why this module is needed? > Is it because gitweb loads it unconditionally, or does that at least > in the default configuration, or is it used in tests, or...? > > [I see it is somewhat addressed below] > >> >> This module has bee in perl core since v5.8, which is the oldest > > s/bee/been/ I'll clarify that in a re-roll & fix the typo, pending any other comments. Thanks! >> version we support, however CentOS (and perhaps some other >> distributions) carve it into its own non-core-perl package that's not >> installed along with /usr/bin/perl by default. Without this we'll hard >> fail the gitweb tests when trying to load the module. > > I see that it because gitweb.perl as the following at line 20: > > use Time::HiRes qw(gettimeofday tv_interval); > >> >> Signed-off-by: Ævar Arnfjörð Bjarmason > > Good catch (if a strange one...). This and the associated cvs tests failing as root patch I submitted were discovered when trying to build git in the standard mock build environment on CentOS. It creates a chroot and rpm installs just the packages you declare, so issues like these crop up. >> --- >> t/gitweb-lib.sh | 5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh >> index d5dab5a94f..116c3890e6 100644 >> --- a/t/gitweb-lib.sh >> +++ b/t/gitweb-lib.sh >> @@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 >> || { >> test_done >> } >> >> +perl -mTime::HiRes -e 0 >/dev/null 2>&1 || { >> + skip_all='skipping gitweb tests, Time::HiRes module unusable' > > Is "unusable" a good description, instead of "not found"? Yeah it's odd, but I just copied the several lines above that use that phrasing. >> + test_done >> +} >> + >> gitweb_init >> >
Re: [PATCH] gitweb tests: Skip tests when we don't have Time::HiRes
W dniu 27.02.2017 o 13:37, Ævar Arnfjörð Bjarmason pisze: > Change the gitweb tests to skip when we can't load the Time::HiRes > module. Could you tell us in the commit message why this module is needed? Is it because gitweb loads it unconditionally, or does that at least in the default configuration, or is it used in tests, or...? [I see it is somewhat addressed below] > > This module has bee in perl core since v5.8, which is the oldest s/bee/been/ > version we support, however CentOS (and perhaps some other > distributions) carve it into its own non-core-perl package that's not > installed along with /usr/bin/perl by default. Without this we'll hard > fail the gitweb tests when trying to load the module. I see that it because gitweb.perl as the following at line 20: use Time::HiRes qw(gettimeofday tv_interval); > > Signed-off-by: Ævar Arnfjörð Bjarmason Good catch (if a strange one...). > --- > t/gitweb-lib.sh | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh > index d5dab5a94f..116c3890e6 100644 > --- a/t/gitweb-lib.sh > +++ b/t/gitweb-lib.sh > @@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 > || { > test_done > } > > +perl -mTime::HiRes -e 0 >/dev/null 2>&1 || { > + skip_all='skipping gitweb tests, Time::HiRes module unusable' Is "unusable" a good description, instead of "not found"? > + test_done > +} > + > gitweb_init >
[PATCH] gitweb tests: Skip tests when we don't have Time::HiRes
Change the gitweb tests to skip when we can't load the Time::HiRes module. This module has bee in perl core since v5.8, which is the oldest version we support, however CentOS (and perhaps some other distributions) carve it into its own non-core-perl package that's not installed along with /usr/bin/perl by default. Without this we'll hard fail the gitweb tests when trying to load the module. Signed-off-by: Ævar Arnfjörð Bjarmason --- t/gitweb-lib.sh | 5 + 1 file changed, 5 insertions(+) diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh index d5dab5a94f..116c3890e6 100644 --- a/t/gitweb-lib.sh +++ b/t/gitweb-lib.sh @@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 || { test_done } +perl -mTime::HiRes -e 0 >/dev/null 2>&1 || { + skip_all='skipping gitweb tests, Time::HiRes module unusable' + test_done +} + gitweb_init -- 2.11.0
Re: gitweb html validation
2016-11-15 19:26 GMT+01:00 Ralf Thielow : Finally I've found the time to actually try this out and there are some problems with it. > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 7cf68f07b..33d7c154f 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -5531,8 +5531,8 @@ sub git_project_search_form { > $limit = " in '$project_filter/'"; > } > > - print "\n"; > print $cgi->start_form(-method => 'get', -action => $my_uri) . > + "\n" . > $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; > print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" > if (defined $project_filter); > @@ -5544,11 +5544,11 @@ sub git_project_search_form { > -checked => $search_use_regexp) . > "\n" . > $cgi->submit(-name => 'btnS', -value => 'Search') . > - $cgi->end_form() . "\n" . > $cgi->a({-href => href(project => undef, searchtext => undef, > project_filter => $project_filter)}, > esc_html("List all projects$limit")) . "\n"; > - print "\n"; > + print "\n" . > + $cgi->end_form() . "\n"; > } > The anchor is now inside the form-tag, which means there is no visual line-break anymore that comes automatically after as the form-tag is a block level element. Could be solved by adding a "", which is not very nice, but OK. > # entry for given @keys needs filling if at least one of keys in list > diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css > index 321260103..507740b6a 100644 > --- a/gitweb/static/gitweb.css > +++ b/gitweb/static/gitweb.css > @@ -539,7 +539,7 @@ div.projsearch { > margin: 20px 0px; > } > > -div.projsearch form { > +form div.projsearch { > margin-bottom: 2px; > } > This is wrong as it overwrites the setting above, 20px at the bottom went to 2px. The problem is how to apply the 2px now. Before this, we had the , a block element, which we can give the 2px margin at the bottom. Now this element is gone and we have a set of inline elements where we use "" to emulate the line break. There are two css rules which can solve this, but I'm not really happy with both of them. 1) As we know we need the two pixel below an input element, we can say div.projsearch input { margin-bottom: 2px; } 2) Make the a-Tag inside div.projsearch being displayed as a block element to make a margin-top setting working. This has the benefit that we don't care about the element above. div.projsearch a { display: inline-block; margin-top: 2px; } If I have to choose I'd prefer the second one. So I can't think of a way to solve this nicely with this change.
Re: gitweb html validation
Le 16.11.2016 01:05, Junio C Hamano a écrit : Ralf Thielow writes: Only block level elements are allowed to be inside form tags, according to https://www.w3.org/2010/04/xhtml10-strict.html#elem_form ... I think it's better to just move the -Tag outside of the surrounding div? Something like this perhaps, I didn't test it myself yet. That sounds like a sensible update to me (no, I do not run gitweb myself). Is this the only we have in the UI, or is it the only one that is problematic? There is an other form in the cgi line 4110 : print $cgi->start_form(-method => "get", -action => $action) . "\n" . But this one has a inside. The problem with projsearch I want to change is that the div is around the form without a container inside. I agree with moving the inside the form if it's a better option. Best regards
Re: gitweb html validation
Ralf Thielow writes: > Only block level elements are > allowed to be inside form tags, according to > https://www.w3.org/2010/04/xhtml10-strict.html#elem_form > ... > I think it's better to just move the -Tag outside of the > surrounding div? > Something like this perhaps, I didn't test it myself yet. That sounds like a sensible update to me (no, I do not run gitweb myself). Is this the only we have in the UI, or is it the only one that is problematic? > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 7cf68f07b..33d7c154f 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -5531,8 +5531,8 @@ sub git_project_search_form { > $limit = " in '$project_filter/'"; > } > > - print "\n"; > print $cgi->start_form(-method => 'get', -action => $my_uri) . > + "\n" . > $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; > print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" > if (defined $project_filter); > @@ -5544,11 +5544,11 @@ sub git_project_search_form { >-checked => $search_use_regexp) . > "\n" . > $cgi->submit(-name => 'btnS', -value => 'Search') . > - $cgi->end_form() . "\n" . > $cgi->a({-href => href(project => undef, searchtext => undef, > project_filter => $project_filter)}, > esc_html("List all projects$limit")) . "\n"; > - print "\n"; > + print "\n" . > + $cgi->end_form() . "\n"; > } > > # entry for given @keys needs filling if at least one of keys in list > diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css > index 321260103..507740b6a 100644 > --- a/gitweb/static/gitweb.css > +++ b/gitweb/static/gitweb.css > @@ -539,7 +539,7 @@ div.projsearch { > margin: 20px 0px; > } > > -div.projsearch form { > +form div.projsearch { > margin-bottom: 2px; > } >
Re: gitweb html validation
Raphaël Gertz wrote: > Hi, > > There a small bug in gitweb html validation, you need the following patch to > pass w3c check with searchbox enabled. > > The problem lies in the input directly embed inside a form without a wrapper > which is not valid. > I agree this is a small bug. Only block level elements are allowed to be inside form tags, according to https://www.w3.org/2010/04/xhtml10-strict.html#elem_form > Best regards > > The following patch fix the issue for git-2.10.2 : > --- /usr/share/gitweb/gitweb.cgi.orig 2016-11-15 15:37:21.149805026 +0100 > +++ /usr/share/gitweb/gitweb.cgi2016-11-15 15:37:48.579240429 +0100 > @@ -5518,6 +5518,7 @@ sub git_project_search_form { > > print "\n"; > print $cgi->start_form(-method => 'get', -action => $my_uri) . > + ''. > $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; > print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" > if (defined $project_filter); > @@ -5529,6 +5530,7 @@ sub git_project_search_form { > -checked => $search_use_regexp) . > "\n" . > $cgi->submit(-name => 'btnS', -value => 'Search') . > + ''. > $cgi->end_form() . "\n" . > $cgi->a({-href => href(project => undef, searchtext => undef, > project_filter => $project_filter)}, I think it's better to just move the -Tag outside of the surrounding div? Something like this perhaps, I didn't test it myself yet. diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 7cf68f07b..33d7c154f 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -5531,8 +5531,8 @@ sub git_project_search_form { $limit = " in '$project_filter/'"; } - print "\n"; print $cgi->start_form(-method => 'get', -action => $my_uri) . + "\n" . $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" if (defined $project_filter); @@ -5544,11 +5544,11 @@ sub git_project_search_form { -checked => $search_use_regexp) . "\n" . $cgi->submit(-name => 'btnS', -value => 'Search') . - $cgi->end_form() . "\n" . $cgi->a({-href => href(project => undef, searchtext => undef, project_filter => $project_filter)}, esc_html("List all projects$limit")) . "\n"; - print "\n"; + print "\n" . + $cgi->end_form() . "\n"; } # entry for given @keys needs filling if at least one of keys in list diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css index 321260103..507740b6a 100644 --- a/gitweb/static/gitweb.css +++ b/gitweb/static/gitweb.css @@ -539,7 +539,7 @@ div.projsearch { margin: 20px 0px; } -div.projsearch form { +form div.projsearch { margin-bottom: 2px; }
gitweb html validation
Hi, There a small bug in gitweb html validation, you need the following patch to pass w3c check with searchbox enabled. The problem lies in the input directly embed inside a form without a wrapper which is not valid. Best regards The following patch fix the issue for git-2.10.2 : --- /usr/share/gitweb/gitweb.cgi.orig 2016-11-15 15:37:21.149805026 +0100 +++ /usr/share/gitweb/gitweb.cgi2016-11-15 15:37:48.579240429 +0100 @@ -5518,6 +5518,7 @@ sub git_project_search_form { print "\n"; print $cgi->start_form(-method => 'get', -action => $my_uri) . + ''. $cgi->hidden(-name => 'a', -value => 'project_list') . "\n"; print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n" if (defined $project_filter); @@ -5529,6 +5530,7 @@ sub git_project_search_form { -checked => $search_use_regexp) . "\n" . $cgi->submit(-name => 'btnS', -value => 'Search') . + ''. $cgi->end_form() . "\n" . $cgi->a({-href => href(project => undef, searchtext => undef, project_filter => $project_filter)},
Re: Forbid access to /gitweb but authorize the sub projets
On Mon, 2016-11-07 at 14:07 +0100, Alexandre Duplaix wrote: > Hello, > > I have several projects under https://myserver/gitweb and I would like > to forbid the access to the root, so that the users can't list the > differents projects. > > However, I need to let the access to the sub projects (ex: > https://myserver/gitweb/?p=project1;a=summary > > How can I do please ? My favourite way of doing this is abusing the fact that gitweb.conf is perl code that's loaded with do $filename. This makes it easy to override such things. Try this in gitweb.conf for example: sub no_index { die_error(403, "No access to the repository list"); } $actions{project_list} = \&no_index; $actions{project_index} = \&no_index; $actions{opml} = \&no_index; -- Dennis Kaarsemaker http://www.kaarsemaker.net
Forbid access to /gitweb but authorize the sub projets
Hello, I have several projects under https://myserver/gitweb and I would like to forbid the access to the root, so that the users can't list the differents projects. However, I need to let the access to the sub projects (ex: https://myserver/gitweb/?p=project1;a=summary How can I do please ? Thank you in advance :) " Ce courriel et les documents qui lui sont joints peuvent contenir des informations confidentielles ou ayant un caractère privé. S'ils ne vous sont pas destinés nous vous signalons qu'il est strictement interdit de les divulguer, de les reproduire ou d'en utiliser de quelque mani�re que ce soit le contenu. Si ce message vous a �t� transmis par erreur, merci d'en informer l'exp�diteur et de supprimer imm�diatement de votre syst�me informatique ce courriel ainsi que tous les documents qui y sont attach�s" ** " This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system." #
Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
On Mon, Oct 17, 2016 at 6:54 PM, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> As far as I can tell the only outstanding "change this" is your >> s/SHA1/SHA-1/ in , do >> you want to fix that up or should I submit another series? > > I think I did that already myself while queuing. Could you fetch > what I queued on 'pu' to double check? Thanks, looked at it, looks good to me! > I think the diff between what was posted and what is queued (I just > checked) looks like this: > > -gitweb: Link to 7-char+ SHA1s, not only 8-char+ > +gitweb: link to 7-char+ SHA-1s, not only 8-char+ > > Change the minimum length of an abbreviated object identifier in the > commit message gitweb tries to turn into link from 8 hexchars to 7. > > @@ -5,16 +12,18 @@ > SHA-1 in commit log message links to "object" view", 2006-12-10), but > the default abbreviation length is 7, and has been for a long time. > > -It's still possible to reference SHA1s down to 4 characters in length, > +It's still possible to reference SHA-1s down to 4 characters in length, > see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make > git actually produce that, so I doubt anyone is putting that into log > -messages in practice, but people definitely do put 7 character SHA1s > +messages in practice, but people definitely do put 7 character SHA-1s > into log messages. > > I think it's fairly dubious to link to things matching [0-9a-fA-F] > here as opposed to just [0-9a-f], that dates back to the initial > version of gitweb from 161332a ("first working version", > -2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce > +2005-08-07). Git will accept all-caps SHA-1s, but didn't ever produce > them as far as I can tell. > > Signed-off-by: Ævar Arnfjörð Bjarmason > +Acked-by: Jakub Narębski > +Signed-off-by: Junio C Hamano
Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
Ævar Arnfjörð Bjarmason writes: > As far as I can tell the only outstanding "change this" is your > s/SHA1/SHA-1/ in , do > you want to fix that up or should I submit another series? I think I did that already myself while queuing. Could you fetch what I queued on 'pu' to double check? I think the diff between what was posted and what is queued (I just checked) looks like this: -gitweb: Link to 7-char+ SHA1s, not only 8-char+ +gitweb: link to 7-char+ SHA-1s, not only 8-char+ Change the minimum length of an abbreviated object identifier in the commit message gitweb tries to turn into link from 8 hexchars to 7. @@ -5,16 +12,18 @@ SHA-1 in commit log message links to "object" view", 2006-12-10), but the default abbreviation length is 7, and has been for a long time. -It's still possible to reference SHA1s down to 4 characters in length, +It's still possible to reference SHA-1s down to 4 characters in length, see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make git actually produce that, so I doubt anyone is putting that into log -messages in practice, but people definitely do put 7 character SHA1s +messages in practice, but people definitely do put 7 character SHA-1s into log messages. I think it's fairly dubious to link to things matching [0-9a-fA-F] here as opposed to just [0-9a-f], that dates back to the initial version of gitweb from 161332a ("first working version", -2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce +2005-08-07). Git will accept all-caps SHA-1s, but didn't ever produce them as far as I can tell. Signed-off-by: Ævar Arnfjörð Bjarmason +Acked-by: Jakub Narębski +Signed-off-by: Junio C Hamano
Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
On Fri, Oct 14, 2016 at 8:40 PM, Junio C Hamano wrote: > Jakub Narębski writes: > >> s/SHA1/SHA-1/g in above paragraph (for correctness and consistency). >>> >>> I think it's fairly dubious to link to things matching [0-9a-fA-F] >>> here as opposed to just [0-9a-f], that dates back to the initial >>> version of gitweb from 161332a ("first working version", >>> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce >>> them as far as I can tell. >> >> All right. If we decide to be more strict in what we accept, we can >> do it in a separate commit. >> >>> >>> Signed-off-by: Ævar Arnfjörð Bjarmason >> >> Acked-by: Jakub Narębski > > Thanks for a review. As the topic is not yet in 'next', I'll squish > in your Acked-by: to them. I saw them only for 1 & 2/3; would > another for 3/3 be coming soon? As far as I can tell the only outstanding "change this" is your s/SHA1/SHA-1/ in , do you want to fix that up or should I submit another series? >> >>> --- >>> gitweb/gitweb.perl | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >>> index cba7405..92b5e91 100755 >>> --- a/gitweb/gitweb.perl >>> +++ b/gitweb/gitweb.perl >>> @@ -2036,7 +2036,7 @@ sub format_log_line_html { >>> my $line = shift; >>> >>> $line = esc_html($line, -nbsp=>1); >>> -$line =~ s{\b([0-9a-fA-F]{8,40})\b}{ >>> +$line =~ s{\b([0-9a-fA-F]{7,40})\b}{ >> >> By the way, it is quite long commit message for one character change. >> Not that it is a bad thing... >> >>> $cgi->a({-href => href(action=>"object", hash=>$1), >>> -class => "text"}, $1); >>> }eg; >>>
Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze: > Change the log formatting function to know about "git describe" output > such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08". > > There are still many valid refnames that we don't link to > e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to > v2.8.0-4-g867ad08, but I'm not supporting that with this commit, > similarly it's trivially possible to create some refnames like > "æ/var-gf6727b0" or which won't be picked up by this regex. It's important to cover most common cases occurring naturally in people's repositories, while trying to avoid false positives (the latter is important more now, where gitweb doesn't check for rev name validity). I guess that most common is to use non-hierarchical tags with ASCII-only names for describe output, so while "æ/var-gf6727b0" won't be picked, it is IMVHO also much less likely to occur. > > There's surely room for improvement here, but I just wanted to address > the very common case of sticking "git describe" output into commit > messages without trying to link to all possible refnames, that's going > to be a rather futile exercise given that this is free text, and it > would be prohibitively expensive to look up whether the references in > question exist in our repository. > > There was on-list discussion about how we could do better than this > patch. Junio suggested to update parse_commits() to call a new > "gitweb--helper" command which would pass each of the revision > candidates through "rev-parse --verify --quiet". That would cut down > on our false positives (e.g. we'll link to "deadbeef"), and also allow > us to be more aggressive in selecting candidate revisions. > > That may be too expensive to work in practice, or it may > not. Investigating that would be a good follow-up to this patch. All right. That's good and enough for one patch. > > Signed-off-by: Ævar Arnfjörð Bjarmason Acked-by: Jakub Narębski BTW. to add to whole "git describe" output or not discussion: having link span whole revision marker makes for easier to use UI: larger are to hit. > --- > gitweb/gitweb.perl | 18 -- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 92b5e91..7cf68f0 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -2036,10 +2036,24 @@ sub format_log_line_html { > my $line = shift; > > $line = esc_html($line, -nbsp=>1); > - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ > + $line =~ s{ > +\b > +( > +# The output of "git describe", e.g. v2.10.0-297-gf6727b0 > +# or hadoop-20160921-113441-20-g094fb7d > +(? +[A-Za-z0-9.-]+ > +(?!\.) # refs can't end with ".", see check_refname_format() > +-g[0-9a-fA-F]{7,40} > +| > +# Just a normal looking Git SHA1 > +[0-9a-fA-F]{7,40} > +) > +\b Nice using of eXplained regexp. It is much easier to understand with comments. > +}{ > $cgi->a({-href => href(action=>"object", hash=>$1), > -class => "text"}, $1); > - }eg; > + }egx; > > return $line; > } >
Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
Ævar Arnfjörð Bjarmason writes: > I just ran into an example of a better reason for doing it like my > patch is doing, which is that if you have some tag like: > > deployment-20160928-171914-16-g42e13d8 > > With my patch the whole thing will be a link to the 42e13d8 commit, > but with this suggestion both 20160928 and 42e13d8 would become commit > links, the former one would be broken. > > Of course we could have some code that would detect that the whole \S+ > is part of one thing that ends in g,... I think that this example shows a flaw not in the "suffix that looks like an object name" approach, but more in the boundary regexp, namely, the \b part; it is probably too loose. And \S+ is not the right cue, either, for that matter. IOW, "we only should take hexstring, optionally prefixed with 'g', that appears before the whitespace" is too strict, as a sentence We broke the system with deployment-g42e13d8. does want to link to 42e13d8, even though full-stop at the end is not whitespace, and the existing regexp uses \b there as a rough equivalent to saying "Here must be a whitespace or punctuation". An attempt to tighten "what a punctuation is" by excluding '-' may make that "timestamp is in the tagname" example work, but is not a good solution, either, because two sentences can be concatenated with an em-dash that is often typed as two hyphen in plain text, resulting in something like We broke the system with deployment-g42e13d8--sigh. and we do want to treat the '-' after 42e13d8 as a punctuation after a described object name. So I agree 3/3 is good thing to do as-is. Thanks.
Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
Jakub Narębski writes: > s/SHA1/SHA-1/g in above paragraph (for correctness and consistency). >> >> I think it's fairly dubious to link to things matching [0-9a-fA-F] >> here as opposed to just [0-9a-f], that dates back to the initial >> version of gitweb from 161332a ("first working version", >> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce >> them as far as I can tell. > > All right. If we decide to be more strict in what we accept, we can > do it in a separate commit. > >> >> Signed-off-by: Ævar Arnfjörð Bjarmason > > Acked-by: Jakub Narębski Thanks for a review. As the topic is not yet in 'next', I'll squish in your Acked-by: to them. I saw them only for 1 & 2/3; would another for 3/3 be coming soon? > >> --- >> gitweb/gitweb.perl | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index cba7405..92b5e91 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -2036,7 +2036,7 @@ sub format_log_line_html { >> my $line = shift; >> >> $line = esc_html($line, -nbsp=>1); >> -$line =~ s{\b([0-9a-fA-F]{8,40})\b}{ >> +$line =~ s{\b([0-9a-fA-F]{7,40})\b}{ > > By the way, it is quite long commit message for one character change. > Not that it is a bad thing... > >> $cgi->a({-href => href(action=>"object", hash=>$1), >> -class => "text"}, $1); >> }eg; >>
Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
On Sun, Oct 9, 2016 at 1:20 PM, Ævar Arnfjörð Bjarmason wrote: > On Thu, Oct 6, 2016 at 9:51 PM, Junio C Hamano wrote: >> Ævar Arnfjörð Bjarmason writes: >> >>> Change the log formatting function to know about "git describe" output >>> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08". >>> >>> There are still many valid refnames that we don't link to >>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to >>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit, >>> similarly it's trivially possible to create some refnames like >>> "æ/var-gf6727b0" or which won't be picked up by this regex. >> >> Not a serious counter-proposal or suggestion, and certainly not an >> objection to the patch I am responding to, but I wonder if it is so >> bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08. >> >> IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we >> allowed an optional 'g' in front of the hex, e.g. >> >> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ >> + $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{ >> >> wouldn't that be much simpler, covers more cases and sufficient? > > It would be more simpler, maybe that's the right approach. I have a > preference for making the entire reference a link. I think it makes > more sense for the UI. I just ran into an example of a better reason for doing it like my patch is doing, which is that if you have some tag like: deployment-20160928-171914-16-g42e13d8 With my patch the whole thing will be a link to the 42e13d8 commit, but with this suggestion both 20160928 and 42e13d8 would become commit links, the former one would be broken. Of course we could have some code that would detect that the whole \S+ is part of one thing that ends in g, but the complexity in doing that would be equivalent or more to doing what my patch is doing. >>> There's surely room for improvement here, but I just wanted to address >>> the very common case of sticking "git describe" output into commit >>> messages without trying to link to all possible refnames, that's going >>> to be a rather futile exercise given that this is free text, and it >>> would be prohibitively expensive to look up whether the references in >>> question exist in our repository. >>> >>> There was on-list discussion about how we could do better than this >>> patch. Junio suggested to update parse_commits() to call a new >>> "gitweb--helper" command which would pass each of the revision >>> candidates through "rev-parse --verify --quiet". That would cut down >>> on our false positives (e.g. we'll link to "deadbeef"), and also allow >>> us to be more aggressive in selecting candidate revisions. >>> >>> That may be too expensive to work in practice, or it may >>> not. Investigating that would be a good follow-up to this patch. >>> >>> Signed-off-by: Ævar Arnfjörð Bjarmason >>> --- >>> gitweb/gitweb.perl | 18 -- >>> 1 file changed, 16 insertions(+), 2 deletions(-) >>> >>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >>> index 92b5e91..7cf68f0 100755 >>> --- a/gitweb/gitweb.perl >>> +++ b/gitweb/gitweb.perl >>> @@ -2036,10 +2036,24 @@ sub format_log_line_html { >>> my $line = shift; >>> >>> $line = esc_html($line, -nbsp=>1); >>> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ >>> + $line =~ s{ >>> +\b >>> +( >>> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0 >>> +# or hadoop-20160921-113441-20-g094fb7d >>> +(?>> +[A-Za-z0-9.-]+ >>> +(?!\.) # refs can't end with ".", see check_refname_format() >>> +-g[0-9a-fA-F]{7,40} >>> +| >>> +# Just a normal looking Git SHA1 >>> +[0-9a-fA-F]{7,40} >>> +) >>> +\b >>> +}{ >>> $cgi->a({-href => href(action=>"object", hash=>$1), >>> -class => "text"}, $1); >>> - }eg; >>> + }egx; >>> >>> return $line; >>> }
Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze: > Change the minimum length of an abbreviated object identifier in the > commit message gitweb tries to turn into link from 8 hexchars to 7. > > This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb: > SHA-1 in commit log message links to "object" view", 2006-12-10), but > the default abbreviation length is 7, and has been for a long time. > > It's still possible to reference SHA1s down to 4 characters in length, > see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make > git actually produce that, so I doubt anyone is putting that into log > messages in practice, but people definitely do put 7 character SHA1s > into log messages. That's nice explanation why we want to support 7 character SHA-1 (it is the default, and was seen in the wild), but don't need to support shorter lengths. Another reason (just for completeness; you don't need to put it in the commit message) to not go below 7 characters is that with decreasing length there is more and more chance for false positives (like 'beef' or 'caffee' for 4-char or 5-char hexstring), as I wrote previously. s/SHA1/SHA-1/g in above paragraph (for correctness and consistency). > > I think it's fairly dubious to link to things matching [0-9a-fA-F] > here as opposed to just [0-9a-f], that dates back to the initial > version of gitweb from 161332a ("first working version", > 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce > them as far as I can tell. All right. If we decide to be more strict in what we accept, we can do it in a separate commit. > > Signed-off-by: Ævar Arnfjörð Bjarmason Acked-by: Jakub Narębski > --- > gitweb/gitweb.perl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index cba7405..92b5e91 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -2036,7 +2036,7 @@ sub format_log_line_html { > my $line = shift; > > $line = esc_html($line, -nbsp=>1); > - $line =~ s{\b([0-9a-fA-F]{8,40})\b}{ > + $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ By the way, it is quite long commit message for one character change. Not that it is a bad thing... > $cgi->a({-href => href(action=>"object", hash=>$1), > -class => "text"}, $1); > }eg; >
Re: [PATCH v2 1/3] gitweb: Fix a typo in a comment
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason napisał: > Change a typo'd MIME type in a comment. The Content-Type is > application/xhtml+xml, not application/xhtm+xml. > > Fixes up code originally added in 53c4031 ("gitweb: Strip > non-printable characters from syntax highlighter output", 2011-09-16). > > Signed-off-by: Ævar Arnfjörð Bjarmason Good. Thanks for taking care of this. For what is worth for such a trivial patch: Acked-by: Jakub Narębski > --- > gitweb/gitweb.perl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 44094f4..cba7405 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -1616,7 +1616,7 @@ sub esc_path { > return $str; > } > > -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0) > +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0) > sub sanitize { > my $str = shift; > >
Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
On Thu, Oct 6, 2016 at 9:51 PM, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> Change the log formatting function to know about "git describe" output >> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08". >> >> There are still many valid refnames that we don't link to >> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to >> v2.8.0-4-g867ad08, but I'm not supporting that with this commit, >> similarly it's trivially possible to create some refnames like >> "æ/var-gf6727b0" or which won't be picked up by this regex. > > Not a serious counter-proposal or suggestion, and certainly not an > objection to the patch I am responding to, but I wonder if it is so > bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08. > > IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we > allowed an optional 'g' in front of the hex, e.g. > > - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ > + $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{ > > wouldn't that be much simpler, covers more cases and sufficient? It would be more simpler, maybe that's the right approach. I have a preference for making the entire reference a link. I think it makes more sense for the UI. >> There's surely room for improvement here, but I just wanted to address >> the very common case of sticking "git describe" output into commit >> messages without trying to link to all possible refnames, that's going >> to be a rather futile exercise given that this is free text, and it >> would be prohibitively expensive to look up whether the references in >> question exist in our repository. >> >> There was on-list discussion about how we could do better than this >> patch. Junio suggested to update parse_commits() to call a new >> "gitweb--helper" command which would pass each of the revision >> candidates through "rev-parse --verify --quiet". That would cut down >> on our false positives (e.g. we'll link to "deadbeef"), and also allow >> us to be more aggressive in selecting candidate revisions. >> >> That may be too expensive to work in practice, or it may >> not. Investigating that would be a good follow-up to this patch. >> >> Signed-off-by: Ævar Arnfjörð Bjarmason >> --- >> gitweb/gitweb.perl | 18 -- >> 1 file changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index 92b5e91..7cf68f0 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -2036,10 +2036,24 @@ sub format_log_line_html { >> my $line = shift; >> >> $line = esc_html($line, -nbsp=>1); >> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ >> + $line =~ s{ >> +\b >> +( >> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0 >> +# or hadoop-20160921-113441-20-g094fb7d >> +(?> +[A-Za-z0-9.-]+ >> +(?!\.) # refs can't end with ".", see check_refname_format() >> +-g[0-9a-fA-F]{7,40} >> +| >> +# Just a normal looking Git SHA1 >> +[0-9a-fA-F]{7,40} >> +) >> +\b >> +}{ >> $cgi->a({-href => href(action=>"object", hash=>$1), >> -class => "text"}, $1); >> - }eg; >> + }egx; >> >> return $line; >> }
Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
Ævar Arnfjörð Bjarmason writes: > Change the log formatting function to know about "git describe" output > such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08". > > There are still many valid refnames that we don't link to > e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to > v2.8.0-4-g867ad08, but I'm not supporting that with this commit, > similarly it's trivially possible to create some refnames like > "æ/var-gf6727b0" or which won't be picked up by this regex. Not a serious counter-proposal or suggestion, and certainly not an objection to the patch I am responding to, but I wonder if it is so bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08. IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we allowed an optional 'g' in front of the hex, e.g. - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ + $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{ wouldn't that be much simpler, covers more cases and sufficient? > There's surely room for improvement here, but I just wanted to address > the very common case of sticking "git describe" output into commit > messages without trying to link to all possible refnames, that's going > to be a rather futile exercise given that this is free text, and it > would be prohibitively expensive to look up whether the references in > question exist in our repository. > > There was on-list discussion about how we could do better than this > patch. Junio suggested to update parse_commits() to call a new > "gitweb--helper" command which would pass each of the revision > candidates through "rev-parse --verify --quiet". That would cut down > on our false positives (e.g. we'll link to "deadbeef"), and also allow > us to be more aggressive in selecting candidate revisions. > > That may be too expensive to work in practice, or it may > not. Investigating that would be a good follow-up to this patch. > > Signed-off-by: Ævar Arnfjörð Bjarmason > --- > gitweb/gitweb.perl | 18 -- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 92b5e91..7cf68f0 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -2036,10 +2036,24 @@ sub format_log_line_html { > my $line = shift; > > $line = esc_html($line, -nbsp=>1); > - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ > + $line =~ s{ > +\b > +( > +# The output of "git describe", e.g. v2.10.0-297-gf6727b0 > +# or hadoop-20160921-113441-20-g094fb7d > +(? +[A-Za-z0-9.-]+ > +(?!\.) # refs can't end with ".", see check_refname_format() > +-g[0-9a-fA-F]{7,40} > +| > +# Just a normal looking Git SHA1 > +[0-9a-fA-F]{7,40} > +) > +\b > +}{ > $cgi->a({-href => href(action=>"object", hash=>$1), > -class => "text"}, $1); > - }eg; > + }egx; > > return $line; > }
Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
Ævar Arnfjörð Bjarmason writes: > Change the minimum length of an abbreviated object identifier in the > commit message gitweb tries to turn into link from 8 hexchars to 7. > > This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb: > SHA-1 in commit log message links to "object" view", 2006-12-10), but > the default abbreviation length is 7, and has been for a long time. > > It's still possible to reference SHA1s down to 4 characters in length, > see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make > git actually produce that, so I doubt anyone is putting that into log > messages in practice, but people definitely do put 7 character SHA1s > into log messages. > > I think it's fairly dubious to link to things matching [0-9a-fA-F] > here as opposed to just [0-9a-f], that dates back to the initial > version of gitweb from 161332a ("first working version", > 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce > them as far as I can tell. > > Signed-off-by: Ævar Arnfjörð Bjarmason > --- s/SHA1/SHA-1/g. I agree that A-F are dubious. > gitweb/gitweb.perl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index cba7405..92b5e91 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -2036,7 +2036,7 @@ sub format_log_line_html { > my $line = shift; > > $line = esc_html($line, -nbsp=>1); > - $line =~ s{\b([0-9a-fA-F]{8,40})\b}{ > + $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ > $cgi->a({-href => href(action=>"object", hash=>$1), > -class => "text"}, $1); > }eg;
[PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+
Change the minimum length of an abbreviated object identifier in the commit message gitweb tries to turn into link from 8 hexchars to 7. This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb: SHA-1 in commit log message links to "object" view", 2006-12-10), but the default abbreviation length is 7, and has been for a long time. It's still possible to reference SHA1s down to 4 characters in length, see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make git actually produce that, so I doubt anyone is putting that into log messages in practice, but people definitely do put 7 character SHA1s into log messages. I think it's fairly dubious to link to things matching [0-9a-fA-F] here as opposed to just [0-9a-f], that dates back to the initial version of gitweb from 161332a ("first working version", 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce them as far as I can tell. Signed-off-by: Ævar Arnfjörð Bjarmason --- gitweb/gitweb.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index cba7405..92b5e91 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -2036,7 +2036,7 @@ sub format_log_line_html { my $line = shift; $line = esc_html($line, -nbsp=>1); - $line =~ s{\b([0-9a-fA-F]{8,40})\b}{ + $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ $cgi->a({-href => href(action=>"object", hash=>$1), -class => "text"}, $1); }eg; -- 2.9.3
[PATCH v2 0/3] gitweb: Be smarter about linking to SHA1s in log messages
This is v2 of patches I sent on September 21st starting at <20160921114428.28664-1-ava...@gmail.com>. Jakub Narębski had a lot of feedback for that series (thanks!). Which as far as I can tell I've incorporated entirely in this re-roll. Ævar Arnfjörð Bjarmason (3): gitweb: Fix a typo in a comment gitweb: Link to 7-char+ SHA1s, not only 8-char+ gitweb: Link to "git describe"'d commits in log messages gitweb/gitweb.perl | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) -- 2.9.3
[PATCH v2 1/3] gitweb: Fix a typo in a comment
Change a typo'd MIME type in a comment. The Content-Type is application/xhtml+xml, not application/xhtm+xml. Fixes up code originally added in 53c4031 ("gitweb: Strip non-printable characters from syntax highlighter output", 2011-09-16). Signed-off-by: Ævar Arnfjörð Bjarmason --- gitweb/gitweb.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 44094f4..cba7405 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -1616,7 +1616,7 @@ sub esc_path { return $str; } -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0) +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0) sub sanitize { my $str = shift; -- 2.9.3
[PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages
Change the log formatting function to know about "git describe" output such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08". There are still many valid refnames that we don't link to e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to v2.8.0-4-g867ad08, but I'm not supporting that with this commit, similarly it's trivially possible to create some refnames like "æ/var-gf6727b0" or which won't be picked up by this regex. There's surely room for improvement here, but I just wanted to address the very common case of sticking "git describe" output into commit messages without trying to link to all possible refnames, that's going to be a rather futile exercise given that this is free text, and it would be prohibitively expensive to look up whether the references in question exist in our repository. There was on-list discussion about how we could do better than this patch. Junio suggested to update parse_commits() to call a new "gitweb--helper" command which would pass each of the revision candidates through "rev-parse --verify --quiet". That would cut down on our false positives (e.g. we'll link to "deadbeef"), and also allow us to be more aggressive in selecting candidate revisions. That may be too expensive to work in practice, or it may not. Investigating that would be a good follow-up to this patch. Signed-off-by: Ævar Arnfjörð Bjarmason --- gitweb/gitweb.perl | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 92b5e91..7cf68f0 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -2036,10 +2036,24 @@ sub format_log_line_html { my $line = shift; $line = esc_html($line, -nbsp=>1); - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ + $line =~ s{ +\b +( +# The output of "git describe", e.g. v2.10.0-297-gf6727b0 +# or hadoop-20160921-113441-20-g094fb7d +(?a({-href => href(action=>"object", hash=>$1), -class => "text"}, $1); - }eg; + }egx; return $line; } -- 2.9.3
Re: [PATCH v4 2/2] gitweb: use highlight's shebang detection
On Sun, Sep 25, 2016, at 11:04 AM, Jakub Narębski wrote: > > For what it is worth it: > > Acked-by: Jakub Narębski > > (but unfortunately *not* tested by). Thank you for all your help. -- Ian Kelling
Re: [PATCH v4 2/2] gitweb: use highlight's shebang detection
W dniu 25.09.2016 o 00:32, Ian Kelling pisze: > The "highlight" binary can, in some cases, determine the language type > by the means of file contents, for example the shebang in the first line > for some scripting languages. Make use of this autodetection for files > which syntax is not known by gitweb. In that case, pass the blob > contents to "highlight --force"; the parameter is needed to make it > always generate HTML output (which includes HTML-escaping). > > Although we now run highlight on files which do not end up highlighted, > performance is virtually unaffected because when we call highlight, it > is used for escaping HTML. In the case that highlight is used, gitweb > calls sanitize() instead of esc_html(), and the latter is significantly > slower (it does more, being roughly a superset of sanitize()). Simple > benchmark comparing performance of 'blob' view of files without syntax > highlighting in gitweb before and after this change indicates ±1% > difference in request time for all file types. Benchmark was performed > on local instance on Debian, using Apache/2.4.23 web server and CGI. > > Document the feature and improve syntax highlight documentation, add > test to ensure gitweb doesn't crash when language detection is used. > > Signed-off-by: Ian Kelling For what it is worth it: Acked-by: Jakub Narębski (but unfortunately *not* tested by). > --- > > Notes: > The only change from v3 is the commit message as suggested by Jakub > Narębski > > Documentation/gitweb.conf.txt | 21 ++--- > gitweb/gitweb.perl | 10 +- > t/t9500-gitweb-standalone-no-errors.sh | 8 > 3 files changed, 27 insertions(+), 12 deletions(-)
Re: [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter
W dniu 25.09.2016 o 00:32, Ian Kelling pisze: > Subject: gitweb: remove unused guess_file_syntax() parameter > > Signed-off-by: Ian Kelling Acked-by: Jakub Narębski > --- > > Notes: > The only change from v3 is a more descriptive commit message > > gitweb/gitweb.perl | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 33d701d..6cb4280 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3913,7 +3913,7 @@ sub blob_contenttype { > # guess file syntax for syntax highlighting; return undef if no highlighting > # the name of syntax can (in the future) depend on syntax highlighter used > sub guess_file_syntax { > - my ($highlight, $mimetype, $file_name) = @_; > + my ($highlight, $file_name) = @_; > return undef unless ($highlight && defined $file_name); > my $basename = basename($file_name, '.in'); > return $highlight_basename{$basename} > @@ -7062,7 +7062,7 @@ sub git_blob { > $have_blame &&= ($mimetype =~ m!^text/!); > > my $highlight = gitweb_check_feature('highlight'); > - my $syntax = guess_file_syntax($highlight, $mimetype, $file_name); > + my $syntax = guess_file_syntax($highlight, $file_name); > $fd = run_highlighter($fd, $highlight, $syntax) > if $syntax; > >
Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
On Sat, Sep 24, 2016, at 09:21 AM, Jakub Narębski wrote: > W dniu 24.09.2016 o 00:15, Jakub Narębski pisze: > > Sidenote: this way of benchmarking of gitweb falls between two ways of > doing a benchmark. > > The first method is to simply run gitweb as a standalone script, passing > its parameters in CGI environment variables; just like the test suite > does it. You would 'time' / 'times' it a few times, drop outliers, and > take average or a median. With this method you don't even need to set > up a web server. > > The second is to use a specialized program to benchmark the server-side > of a web page, for example 'ab' (ApacheBench), httperf, curl-loader > or JMeter. The first one is usually distributed together with Apache > web server, so you probably have it installed already. Those tools > provide timing statistics. Good to know. Thanks.
Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
On Fri, Sep 23, 2016, at 03:15 PM, Jakub Narębski wrote: > W dniu 23.09.2016 o 11:08, Ian Kelling napisał: > > > The "highlight" binary can, in some cases, determine the language type > > by the means of file contents, for example the shebang in the first line > > for some scripting languages. Make use of this autodetection for files > > which syntax is not known by gitweb. In that case, pass the blob > > contents to "highlight --force"; the parameter is needed to make it > > always generate HTML output (which includes HTML-escaping). > > Right. > > > > > Although we now run highlight on files which do not end up highlighted, > > performance is virtually unaffected because when we call highlight, we > > also call sanitize() instead of esc_html(), which is significantly > > slower. > > This paragraph is a bit unclear, for example it is not obvious what > "..., which is significantly slower" refers to: sanitize() or esc_html(). > > I think it would be better to write: > > Although we now run highlight on files which do not end up highlighted, > performance is virtually unaffected because when we call highlight, it > is used for escaping HTML. In the case that highlight is used, gitweb > calls sanitize() instead of esc_html(), and the latter is significantly > slower (it does more, being roughly a superset of sanitize()). Agree. Done in v4. > > >After curling blob view of unhighlighted large and small text > > files of perl code and license text 100 times each on a local > > Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in > > request time for all file types. > > Also, "curling" is not the word I would like to see. I would say: > > Simple benchmark comparing performance of 'blob' view of files without > syntax highlighting in gitweb before and after this change indicates > ±1% difference in request time for all file types. Benchmark was > performed on local instance on Debian, using Apache/2.4.23 web server > and CGI/PSGI/FCGI/mod_perl. > > ^^--- select one > > Or something like that; I'm not sure how detailed this should be. > But it is nice to have such benchmark in the commit message. Sounds good. Used it in v4. > > Anyway I think that adding yet another configuration toggle for selecting > whether to use "highlight" syntax autodetection or not would be just an > unnecessary complication. > > Note that the performance loss might be quite higher on MS Windows, with > its higher cost of fork. But then they probably do not configure > server-side highligher anyway. > > > > > Document the feature and improve syntax highlight documentation, add > > test to ensure gitweb doesn't crash when language detection is used. > > Good. > > > > > Signed-off-by: Ian Kelling > > --- > > Documentation/gitweb.conf.txt | 21 ++--- > > gitweb/gitweb.perl | 10 +- > > t/t9500-gitweb-standalone-no-errors.sh | 8 > > 3 files changed, 27 insertions(+), 12 deletions(-) > > > > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt > > index a79e350..e632089 100644 > > --- a/Documentation/gitweb.conf.txt > > +++ b/Documentation/gitweb.conf.txt > > @@ -246,13 +246,20 @@ $highlight_bin:: > > We should probably say what does it mean to be "highlight"[1] compatible, > but it is outside of scope for this patch, and I think also out of scope > of this series. > > > Note that 'highlight' feature must be set for gitweb to actually > > use syntax highlighting. > > + > > -*NOTE*: if you want to add support for new file type (supported by > > -"highlight" but not used by gitweb), you need to modify `%highlight_ext` > > -or `%highlight_basename`, depending on whether you detect type of file > > -based on extension (for example "sh") or on its basename (for example > > -"Makefile"). The keys of these hashes are extension and basename, > > -respectively, and value for given key is name of syntax to be passed via > > -`--syntax ` to highlighter. > > +*NOTE*: for a file to be highlighted, its syntax type must be detected > > +and that syntax must be supported by "highlight". The default syntax > > +detection is minimal, and there are many supported syntax types with no > > +detection by default. There are three options for adding syntax > > +detection. The first and second priority are `%highlight_basename` and > > +`%
[PATCH v4 2/2] gitweb: use highlight's shebang detection
The "highlight" binary can, in some cases, determine the language type by the means of file contents, for example the shebang in the first line for some scripting languages. Make use of this autodetection for files which syntax is not known by gitweb. In that case, pass the blob contents to "highlight --force"; the parameter is needed to make it always generate HTML output (which includes HTML-escaping). Although we now run highlight on files which do not end up highlighted, performance is virtually unaffected because when we call highlight, it is used for escaping HTML. In the case that highlight is used, gitweb calls sanitize() instead of esc_html(), and the latter is significantly slower (it does more, being roughly a superset of sanitize()). Simple benchmark comparing performance of 'blob' view of files without syntax highlighting in gitweb before and after this change indicates ±1% difference in request time for all file types. Benchmark was performed on local instance on Debian, using Apache/2.4.23 web server and CGI. Document the feature and improve syntax highlight documentation, add test to ensure gitweb doesn't crash when language detection is used. Signed-off-by: Ian Kelling --- Notes: The only change from v3 is the commit message as suggested by Jakub Narębski Documentation/gitweb.conf.txt | 21 ++--- gitweb/gitweb.perl | 10 +- t/t9500-gitweb-standalone-no-errors.sh | 8 3 files changed, 27 insertions(+), 12 deletions(-) diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt index a79e350..e632089 100644 --- a/Documentation/gitweb.conf.txt +++ b/Documentation/gitweb.conf.txt @@ -246,13 +246,20 @@ $highlight_bin:: Note that 'highlight' feature must be set for gitweb to actually use syntax highlighting. + -*NOTE*: if you want to add support for new file type (supported by -"highlight" but not used by gitweb), you need to modify `%highlight_ext` -or `%highlight_basename`, depending on whether you detect type of file -based on extension (for example "sh") or on its basename (for example -"Makefile"). The keys of these hashes are extension and basename, -respectively, and value for given key is name of syntax to be passed via -`--syntax ` to highlighter. +*NOTE*: for a file to be highlighted, its syntax type must be detected +and that syntax must be supported by "highlight". The default syntax +detection is minimal, and there are many supported syntax types with no +detection by default. There are three options for adding syntax +detection. The first and second priority are `%highlight_basename` and +`%highlight_ext`, which detect based on basename (the full filename, for +example "Makefile") and extension (for example "sh"). The keys of these +hashes are the basename and extension, respectively, and the value for a +given key is the name of the syntax to be passed via `--syntax ` +to "highlight". The last priority is the "highlight" configuration of +`Shebang` regular expressions to detect the language based on the first +line in the file, (for example, matching the line "#!/bin/bash"). See +the highlight documentation and the default config at +/etc/highlight/filetypes.conf for more details. + For example if repositories you are hosting use "phtml" extension for PHP files, and you want to have correct syntax-highlighting for those diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 6cb4280..44094f4 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3931,15 +3931,16 @@ sub guess_file_syntax { # or return original FD if no highlighting sub run_highlighter { my ($fd, $highlight, $syntax) = @_; - return $fd unless ($highlight && defined $syntax); + return $fd unless ($highlight); close $fd; + my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force"; open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ". quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse', '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);', '--', "-fe=$fallback_encoding")." | ". quote_command($highlight_bin). - " --replace-tabs=8 --fragment --syntax $syntax |" + " --replace-tabs=8 --fragment $syntax_arg |" or die_error(500, "Couldn't open file or run syntax highlighter"); return $fd; } @@ -7063,8 +7064,7 @@ sub git_blob { my $highlight = gitweb_check_feature('highlight'); my $syntax = guess_file_syntax($highlight, $file_name); - $fd =
[PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter
Signed-off-by: Ian Kelling --- Notes: The only change from v3 is a more descriptive commit message gitweb/gitweb.perl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 33d701d..6cb4280 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3913,7 +3913,7 @@ sub blob_contenttype { # guess file syntax for syntax highlighting; return undef if no highlighting # the name of syntax can (in the future) depend on syntax highlighter used sub guess_file_syntax { - my ($highlight, $mimetype, $file_name) = @_; + my ($highlight, $file_name) = @_; return undef unless ($highlight && defined $file_name); my $basename = basename($file_name, '.in'); return $highlight_basename{$basename} @@ -7062,7 +7062,7 @@ sub git_blob { $have_blame &&= ($mimetype =~ m!^text/!); my $highlight = gitweb_check_feature('highlight'); - my $syntax = guess_file_syntax($highlight, $mimetype, $file_name); + my $syntax = guess_file_syntax($highlight, $file_name); $fd = run_highlighter($fd, $highlight, $syntax) if $syntax; -- 2.9.3
Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
Jakub Narębski writes: >> Also, "curling" is not the word I would like to see. I would say: >> >> Simple benchmark comparing performance of 'blob' view of files without >> syntax highlighting in gitweb before and after this change indicates >> ±1% difference in request time for all file types. Benchmark was >> performed on local instance on Debian, using Apache/2.4.23 web server >> and CGI/PSGI/FCGI/mod_perl. >> >> ^^--- select one or state that all of them produced similar results ;-) >> Or something like that; I'm not sure how detailed this should be. >> But it is nice to have such benchmark in the commit message. > > Sidenote: this way of benchmarking of gitweb falls between two ways of > doing a benchmark. All good comments. Thanks.
Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
W dniu 24.09.2016 o 00:15, Jakub Narębski pisze: > W dniu 23.09.2016 o 11:08, Ian Kelling napisał: >>After curling blob view of unhighlighted large and small text >> files of perl code and license text 100 times each on a local >> Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in >> request time for all file types. > > Also, "curling" is not the word I would like to see. I would say: > > Simple benchmark comparing performance of 'blob' view of files without > syntax highlighting in gitweb before and after this change indicates > ±1% difference in request time for all file types. Benchmark was > performed on local instance on Debian, using Apache/2.4.23 web server > and CGI/PSGI/FCGI/mod_perl. > > ^^--- select one > > Or something like that; I'm not sure how detailed this should be. > But it is nice to have such benchmark in the commit message. Sidenote: this way of benchmarking of gitweb falls between two ways of doing a benchmark. The first method is to simply run gitweb as a standalone script, passing its parameters in CGI environment variables; just like the test suite does it. You would 'time' / 'times' it a few times, drop outliers, and take average or a median. With this method you don't even need to set up a web server. The second is to use a specialized program to benchmark the server-side of a web page, for example 'ab' (ApacheBench), httperf, curl-loader or JMeter. The first one is usually distributed together with Apache web server, so you probably have it installed already. Those tools provide timing statistics. [...] > Note that the performance loss might be quite higher on MS Windows, with > its higher cost of fork. But then they probably do not configure > server-side highligher anyway.
Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection
W dniu 23.09.2016 o 11:08, Ian Kelling napisał: > The "highlight" binary can, in some cases, determine the language type > by the means of file contents, for example the shebang in the first line > for some scripting languages. Make use of this autodetection for files > which syntax is not known by gitweb. In that case, pass the blob > contents to "highlight --force"; the parameter is needed to make it > always generate HTML output (which includes HTML-escaping). Right. > > Although we now run highlight on files which do not end up highlighted, > performance is virtually unaffected because when we call highlight, we > also call sanitize() instead of esc_html(), which is significantly > slower. This paragraph is a bit unclear, for example it is not obvious what "..., which is significantly slower" refers to: sanitize() or esc_html(). I think it would be better to write: Although we now run highlight on files which do not end up highlighted, performance is virtually unaffected because when we call highlight, it is used for escaping HTML. In the case that highlight is used, gitweb calls sanitize() instead of esc_html(), and the latter is significantly slower (it does more, being roughly a superset of sanitize()). >After curling blob view of unhighlighted large and small text > files of perl code and license text 100 times each on a local > Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in > request time for all file types. Also, "curling" is not the word I would like to see. I would say: Simple benchmark comparing performance of 'blob' view of files without syntax highlighting in gitweb before and after this change indicates ±1% difference in request time for all file types. Benchmark was performed on local instance on Debian, using Apache/2.4.23 web server and CGI/PSGI/FCGI/mod_perl. ^^--- select one Or something like that; I'm not sure how detailed this should be. But it is nice to have such benchmark in the commit message. Anyway I think that adding yet another configuration toggle for selecting whether to use "highlight" syntax autodetection or not would be just an unnecessary complication. Note that the performance loss might be quite higher on MS Windows, with its higher cost of fork. But then they probably do not configure server-side highligher anyway. > > Document the feature and improve syntax highlight documentation, add > test to ensure gitweb doesn't crash when language detection is used. Good. > > Signed-off-by: Ian Kelling > --- > Documentation/gitweb.conf.txt | 21 ++--- > gitweb/gitweb.perl | 10 +- > t/t9500-gitweb-standalone-no-errors.sh | 8 > 3 files changed, 27 insertions(+), 12 deletions(-) > > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt > index a79e350..e632089 100644 > --- a/Documentation/gitweb.conf.txt > +++ b/Documentation/gitweb.conf.txt > @@ -246,13 +246,20 @@ $highlight_bin:: We should probably say what does it mean to be "highlight"[1] compatible, but it is outside of scope for this patch, and I think also out of scope of this series. > Note that 'highlight' feature must be set for gitweb to actually > use syntax highlighting. > + > -*NOTE*: if you want to add support for new file type (supported by > -"highlight" but not used by gitweb), you need to modify `%highlight_ext` > -or `%highlight_basename`, depending on whether you detect type of file > -based on extension (for example "sh") or on its basename (for example > -"Makefile"). The keys of these hashes are extension and basename, > -respectively, and value for given key is name of syntax to be passed via > -`--syntax ` to highlighter. > +*NOTE*: for a file to be highlighted, its syntax type must be detected > +and that syntax must be supported by "highlight". The default syntax > +detection is minimal, and there are many supported syntax types with no > +detection by default. There are three options for adding syntax > +detection. The first and second priority are `%highlight_basename` and > +`%highlight_ext`, which detect based on basename (the full filename, for > +example "Makefile") and extension (for example "sh"). The keys of these > +hashes are the basename and extension, respectively, and the value for a > +given key is the name of the syntax to be passed via `--syntax ` > +to "highlight". The last priority is the "highlight" configuration of > +`Shebang` regular expressions to detect the language based on the first > +line in the file, (for example, matching the line
Re: [PATCH v3 1/2] gitweb: remove unused function parameter
Jakub Narębski writes: > I think it would be better to be more descriptive, and say: > > Subject: [PATCH v3 1/2] gitweb: remove unused parameter from > guess_file_syntax() > Acked-by: Jakub Narębski Thanks.
Re: [PATCH v3 1/2] gitweb: remove unused function parameter
W dniu 23.09.2016 o 11:08, Ian Kelling napisał: > > Subject: [PATCH v3 1/2] gitweb: remove unused function parameter I think it would be better to be more descriptive, and say: Subject: [PATCH v3 1/2] gitweb: remove unused parameter from guess_file_syntax() But that might be too long... > > Signed-off-by: Ian Kelling With, or without this change, it's nice. Acked-by: Jakub Narębski > --- > gitweb/gitweb.perl | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 33d701d..6cb4280 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3913,7 +3913,7 @@ sub blob_contenttype { > # guess file syntax for syntax highlighting; return undef if no highlighting > # the name of syntax can (in the future) depend on syntax highlighter used > sub guess_file_syntax { > - my ($highlight, $mimetype, $file_name) = @_; > + my ($highlight, $file_name) = @_; > return undef unless ($highlight && defined $file_name); > my $basename = basename($file_name, '.in'); > return $highlight_basename{$basename} > @@ -7062,7 +7062,7 @@ sub git_blob { > $have_blame &&= ($mimetype =~ m!^text/!); > > my $highlight = gitweb_check_feature('highlight'); > - my $syntax = guess_file_syntax($highlight, $mimetype, $file_name); > + my $syntax = guess_file_syntax($highlight, $file_name); > $fd = run_highlighter($fd, $highlight, $syntax) > if $syntax; > >
Re: [PATCH v2] gitweb: use highlight's shebang detection
On Thu, Sep 22, 2016, at 03:50 PM, Jakub Narębski wrote: > W dniu 22.09.2016 o 00:18, Ian Kelling napisał: > > > The highlight binary can detect language by shebang when we can't tell > > the syntax type by the name of the file. In that case, pass the blob > > to "highlight --force" and the resulting html will have markup for > > highlighting if the language was detected. > > This description feels a bit convoluted. Perhaps something like this: > > The "highlight" binary can, in some cases, determine the language type > by the means of file contents, for example the shebang in the first > line > for some scripting languages. Make use of this autodetection for files > which syntax is not known by gitweb. In that case, pass the blob > contents to "highlight --force"; the parameter is needed to make it > always generate HTML output (which includes HTML-escaping). Nice. Using it in v3. > > Also, we might want to have the information about performance of this > solution either in the commit message, or in commit comments. I tested it more rigorously and added to v3 commit message. > > > > > Document the feature and improve syntax highlight documentation, add > > test to ensure gitweb doesn't crash when language detection is used, > > All right. > > > and remove an unused parameter from gitweb_check_feature(). > > First, that is guess_file_syntax(), not gitweb_check_feature(). > Second, this change could be made into independent patch, for example > preparatory one. Oops. I split it out in v3. > > > > > Signed-off-by: Ian Kelling > > --- > > Documentation/gitweb.conf.txt | 21 ++--- > > gitweb/gitweb.perl | 14 +++--- > > t/t9500-gitweb-standalone-no-errors.sh | 8 > > 3 files changed, 29 insertions(+), 14 deletions(-) > > > > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt > > index a79e350..e632089 100644 > > --- a/Documentation/gitweb.conf.txt > > +++ b/Documentation/gitweb.conf.txt > > @@ -246,13 +246,20 @@ $highlight_bin:: > > Note that 'highlight' feature must be set for gitweb to actually > > use syntax highlighting. > > + > > -*NOTE*: if you want to add support for new file type (supported by > > -"highlight" but not used by gitweb), you need to modify `%highlight_ext` > > -or `%highlight_basename`, depending on whether you detect type of file > > -based on extension (for example "sh") or on its basename (for example > > -"Makefile"). The keys of these hashes are extension and basename, > > -respectively, and value for given key is name of syntax to be passed via > > -`--syntax ` to highlighter. > > +*NOTE*: for a file to be highlighted, its syntax type must be detected > > +and that syntax must be supported by "highlight". The default syntax > > +detection is minimal, and there are many supported syntax types with no > > +detection by default. There are three options for adding syntax > > +detection. The first and second priority are `%highlight_basename` and > > +`%highlight_ext`, which detect based on basename (the full filename, for > > +example "Makefile") and extension (for example "sh"). The keys of these > > +hashes are the basename and extension, respectively, and the value for a > > +given key is the name of the syntax to be passed via `--syntax ` > > +to "highlight". The last priority is the "highlight" configuration of > > +`Shebang` regular expressions to detect the language based on the first > > +line in the file, (for example, matching the line "#!/bin/bash"). See > > +the highlight documentation and the default config at > > +/etc/highlight/filetypes.conf for more details. > > + > > I think the rewrite is a bit more readable. > > > For example if repositories you are hosting use "phtml" extension for > > PHP files, and you want to have correct syntax-highlighting for those > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > > index 33d701d..44094f4 100755 > > --- a/gitweb/gitweb.perl > > +++ b/gitweb/gitweb.perl > > @@ -3913,7 +3913,7 @@ sub blob_contenttype { > > # guess file syntax for syntax highlighting; return undef if no > > highlighting > > # the name of syntax can (in the future) depend on syntax highlighter used > > sub guess_file_syntax { > > - my ($highlight, $mimetype, $file_name) = @_; > > + my ($highlight, $file_name) = @_; > > Right. >
[PATCH v3 2/2] gitweb: use highlight's shebang detection
The "highlight" binary can, in some cases, determine the language type by the means of file contents, for example the shebang in the first line for some scripting languages. Make use of this autodetection for files which syntax is not known by gitweb. In that case, pass the blob contents to "highlight --force"; the parameter is needed to make it always generate HTML output (which includes HTML-escaping). Although we now run highlight on files which do not end up highlighted, performance is virtually unaffected because when we call highlight, we also call sanitize() instead of esc_html(), which is significantly slower. After curling blob view of unhighlighted large and small text files of perl code and license text 100 times each on a local Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in request time for all file types. Document the feature and improve syntax highlight documentation, add test to ensure gitweb doesn't crash when language detection is used. Signed-off-by: Ian Kelling --- Documentation/gitweb.conf.txt | 21 ++--- gitweb/gitweb.perl | 10 +- t/t9500-gitweb-standalone-no-errors.sh | 8 3 files changed, 27 insertions(+), 12 deletions(-) diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt index a79e350..e632089 100644 --- a/Documentation/gitweb.conf.txt +++ b/Documentation/gitweb.conf.txt @@ -246,13 +246,20 @@ $highlight_bin:: Note that 'highlight' feature must be set for gitweb to actually use syntax highlighting. + -*NOTE*: if you want to add support for new file type (supported by -"highlight" but not used by gitweb), you need to modify `%highlight_ext` -or `%highlight_basename`, depending on whether you detect type of file -based on extension (for example "sh") or on its basename (for example -"Makefile"). The keys of these hashes are extension and basename, -respectively, and value for given key is name of syntax to be passed via -`--syntax ` to highlighter. +*NOTE*: for a file to be highlighted, its syntax type must be detected +and that syntax must be supported by "highlight". The default syntax +detection is minimal, and there are many supported syntax types with no +detection by default. There are three options for adding syntax +detection. The first and second priority are `%highlight_basename` and +`%highlight_ext`, which detect based on basename (the full filename, for +example "Makefile") and extension (for example "sh"). The keys of these +hashes are the basename and extension, respectively, and the value for a +given key is the name of the syntax to be passed via `--syntax ` +to "highlight". The last priority is the "highlight" configuration of +`Shebang` regular expressions to detect the language based on the first +line in the file, (for example, matching the line "#!/bin/bash"). See +the highlight documentation and the default config at +/etc/highlight/filetypes.conf for more details. + For example if repositories you are hosting use "phtml" extension for PHP files, and you want to have correct syntax-highlighting for those diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 6cb4280..44094f4 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3931,15 +3931,16 @@ sub guess_file_syntax { # or return original FD if no highlighting sub run_highlighter { my ($fd, $highlight, $syntax) = @_; - return $fd unless ($highlight && defined $syntax); + return $fd unless ($highlight); close $fd; + my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force"; open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ". quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse', '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);', '--', "-fe=$fallback_encoding")." | ". quote_command($highlight_bin). - " --replace-tabs=8 --fragment --syntax $syntax |" + " --replace-tabs=8 --fragment $syntax_arg |" or die_error(500, "Couldn't open file or run syntax highlighter"); return $fd; } @@ -7063,8 +7064,7 @@ sub git_blob { my $highlight = gitweb_check_feature('highlight'); my $syntax = guess_file_syntax($highlight, $file_name); - $fd = run_highlighter($fd, $highlight, $syntax) - if $syntax; + $fd = run_highlighter($fd, $highlight, $syntax); git_header_html(undef, $expires); my $formats_nav = ''; @@ -7117,7 +7117,7 @@ sub git_blob {
[PATCH v3 1/2] gitweb: remove unused function parameter
Signed-off-by: Ian Kelling --- gitweb/gitweb.perl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 33d701d..6cb4280 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3913,7 +3913,7 @@ sub blob_contenttype { # guess file syntax for syntax highlighting; return undef if no highlighting # the name of syntax can (in the future) depend on syntax highlighter used sub guess_file_syntax { - my ($highlight, $mimetype, $file_name) = @_; + my ($highlight, $file_name) = @_; return undef unless ($highlight && defined $file_name); my $basename = basename($file_name, '.in'); return $highlight_basename{$basename} @@ -7062,7 +7062,7 @@ sub git_blob { $have_blame &&= ($mimetype =~ m!^text/!); my $highlight = gitweb_check_feature('highlight'); - my $syntax = guess_file_syntax($highlight, $mimetype, $file_name); + my $syntax = guess_file_syntax($highlight, $file_name); $fd = run_highlighter($fd, $highlight, $syntax) if $syntax; -- 2.9.3
Re: [PATCH v2] gitweb: use highlight's shebang detection
W dniu 22.09.2016 o 00:18, Ian Kelling napisał: > The highlight binary can detect language by shebang when we can't tell > the syntax type by the name of the file. In that case, pass the blob > to "highlight --force" and the resulting html will have markup for > highlighting if the language was detected. This description feels a bit convoluted. Perhaps something like this: The "highlight" binary can, in some cases, determine the language type by the means of file contents, for example the shebang in the first line for some scripting languages. Make use of this autodetection for files which syntax is not known by gitweb. In that case, pass the blob contents to "highlight --force"; the parameter is needed to make it always generate HTML output (which includes HTML-escaping). Also, we might want to have the information about performance of this solution either in the commit message, or in commit comments. > > Document the feature and improve syntax highlight documentation, add > test to ensure gitweb doesn't crash when language detection is used, All right. > and remove an unused parameter from gitweb_check_feature(). First, that is guess_file_syntax(), not gitweb_check_feature(). Second, this change could be made into independent patch, for example preparatory one. > > Signed-off-by: Ian Kelling > --- > Documentation/gitweb.conf.txt | 21 ++--- > gitweb/gitweb.perl | 14 +++--- > t/t9500-gitweb-standalone-no-errors.sh | 8 > 3 files changed, 29 insertions(+), 14 deletions(-) > > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt > index a79e350..e632089 100644 > --- a/Documentation/gitweb.conf.txt > +++ b/Documentation/gitweb.conf.txt > @@ -246,13 +246,20 @@ $highlight_bin:: > Note that 'highlight' feature must be set for gitweb to actually > use syntax highlighting. > + > -*NOTE*: if you want to add support for new file type (supported by > -"highlight" but not used by gitweb), you need to modify `%highlight_ext` > -or `%highlight_basename`, depending on whether you detect type of file > -based on extension (for example "sh") or on its basename (for example > -"Makefile"). The keys of these hashes are extension and basename, > -respectively, and value for given key is name of syntax to be passed via > -`--syntax ` to highlighter. > +*NOTE*: for a file to be highlighted, its syntax type must be detected > +and that syntax must be supported by "highlight". The default syntax > +detection is minimal, and there are many supported syntax types with no > +detection by default. There are three options for adding syntax > +detection. The first and second priority are `%highlight_basename` and > +`%highlight_ext`, which detect based on basename (the full filename, for > +example "Makefile") and extension (for example "sh"). The keys of these > +hashes are the basename and extension, respectively, and the value for a > +given key is the name of the syntax to be passed via `--syntax ` > +to "highlight". The last priority is the "highlight" configuration of > +`Shebang` regular expressions to detect the language based on the first > +line in the file, (for example, matching the line "#!/bin/bash"). See > +the highlight documentation and the default config at > +/etc/highlight/filetypes.conf for more details. > + I think the rewrite is a bit more readable. > For example if repositories you are hosting use "phtml" extension for > PHP files, and you want to have correct syntax-highlighting for those > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 33d701d..44094f4 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3913,7 +3913,7 @@ sub blob_contenttype { > # guess file syntax for syntax highlighting; return undef if no highlighting > # the name of syntax can (in the future) depend on syntax highlighter used > sub guess_file_syntax { > - my ($highlight, $mimetype, $file_name) = @_; > + my ($highlight, $file_name) = @_; Right. > return undef unless ($highlight && defined $file_name); > my $basename = basename($file_name, '.in'); > return $highlight_basename{$basename} > @@ -3931,15 +3931,16 @@ sub guess_file_syntax { > # or return original FD if no highlighting > sub run_highlighter { > my ($fd, $highlight, $syntax) = @_; > - return $fd unless ($highlight && defined $syntax); > + return $fd unless ($highlight); Run highlighter if it is defined, even if gitweb doesn't know syntax, right. >
Re: [PATCH] gitweb: use highlight's shebang detection
fyi: I mistakenly did not include v2 in the subject of the last message.
[PATCH] gitweb: use highlight's shebang detection
The highlight binary can detect language by shebang when we can't tell the syntax type by the name of the file. In that case, pass the blob to "highlight --force" and the resulting html will have markup for highlighting if the language was detected. Document the feature and improve syntax highlight documentation, add test to ensure gitweb doesn't crash when language detection is used, and remove an unused parameter from gitweb_check_feature(). Signed-off-by: Ian Kelling --- Documentation/gitweb.conf.txt | 21 ++++++--- gitweb/gitweb.perl | 14 +++--- t/t9500-gitweb-standalone-no-errors.sh | 8 3 files changed, 29 insertions(+), 14 deletions(-) diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt index a79e350..e632089 100644 --- a/Documentation/gitweb.conf.txt +++ b/Documentation/gitweb.conf.txt @@ -246,13 +246,20 @@ $highlight_bin:: Note that 'highlight' feature must be set for gitweb to actually use syntax highlighting. + -*NOTE*: if you want to add support for new file type (supported by -"highlight" but not used by gitweb), you need to modify `%highlight_ext` -or `%highlight_basename`, depending on whether you detect type of file -based on extension (for example "sh") or on its basename (for example -"Makefile"). The keys of these hashes are extension and basename, -respectively, and value for given key is name of syntax to be passed via -`--syntax ` to highlighter. +*NOTE*: for a file to be highlighted, its syntax type must be detected +and that syntax must be supported by "highlight". The default syntax +detection is minimal, and there are many supported syntax types with no +detection by default. There are three options for adding syntax +detection. The first and second priority are `%highlight_basename` and +`%highlight_ext`, which detect based on basename (the full filename, for +example "Makefile") and extension (for example "sh"). The keys of these +hashes are the basename and extension, respectively, and the value for a +given key is the name of the syntax to be passed via `--syntax ` +to "highlight". The last priority is the "highlight" configuration of +`Shebang` regular expressions to detect the language based on the first +line in the file, (for example, matching the line "#!/bin/bash"). See +the highlight documentation and the default config at +/etc/highlight/filetypes.conf for more details. + For example if repositories you are hosting use "phtml" extension for PHP files, and you want to have correct syntax-highlighting for those diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 33d701d..44094f4 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3913,7 +3913,7 @@ sub blob_contenttype { # guess file syntax for syntax highlighting; return undef if no highlighting # the name of syntax can (in the future) depend on syntax highlighter used sub guess_file_syntax { - my ($highlight, $mimetype, $file_name) = @_; + my ($highlight, $file_name) = @_; return undef unless ($highlight && defined $file_name); my $basename = basename($file_name, '.in'); return $highlight_basename{$basename} @@ -3931,15 +3931,16 @@ sub guess_file_syntax { # or return original FD if no highlighting sub run_highlighter { my ($fd, $highlight, $syntax) = @_; - return $fd unless ($highlight && defined $syntax); + return $fd unless ($highlight); close $fd; + my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force"; open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ". quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', '-pse', '$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);', '--', "-fe=$fallback_encoding")." | ". quote_command($highlight_bin). - " --replace-tabs=8 --fragment --syntax $syntax |" + " --replace-tabs=8 --fragment $syntax_arg |" or die_error(500, "Couldn't open file or run syntax highlighter"); return $fd; } @@ -7062,9 +7063,8 @@ sub git_blob { $have_blame &&= ($mimetype =~ m!^text/!); my $highlight = gitweb_check_feature('highlight'); - my $syntax = guess_file_syntax($highlight, $mimetype, $file_name); - $fd = run_highlighter($fd, $highlight, $syntax) - if $syntax; + my $syntax = guess_file_syntax($highlight, $file_name); + $fd = run_highlighter($fd, $highlight, $syntax); git_header_html(undef, $expires); my $formats_nav = '
Re: [PATCH] gitweb: use highlight's shebang detection
On Tue, Sep 20, 2016, at 01:22 PM, Jakub Narębski wrote: > W dniu 06.09.2016 o 21:00, Ian Kelling pisze: > > > The highlight binary can detect language by shebang when we can't tell > > the syntax type by the name of the file. > > Was it something always present among highlight[1] binary capabilities, > or is it something present only in new enough highlight app? Or only > in some specific fork / specific binary? I couldn't find language > detection in highlight[1] documentation... > > [1]: http://www.andre-simon.de/doku/highlight/en/highlight.php Search for the word shebang, it's mentioned twice. > > If this feature is available only for some version, or for some > highlighters, gitweb would have to provide an option to configure > it. It might be an additional configuration variable, it might > be a special value in the %highlight_basename or %highlight_ext. Good question. It was added upstream in 2007, and I tested that it's functioning in the earliest distros I have easy access to: ubuntu 14.04 and debian wheezy. > > > To use highlight's shebang > > detection, add highlight to the pipeline whenever highlight is enabled. > > This describes what this patch does, but the sentence feels > a bit convoluted, as it is stated. > Agreed. I've changed it in v2 of the patch, and perhaps this will make the rest of the patch clearer too. The new paragraph is: The highlight binary can detect language by shebang when we can't tell the syntax type by the name of the file. In that case, pass the blob to "highlight --force" and the resulting html will have markup for highlighting if the language was detected. > > > > Document the shebang detection and add a test which exercises it in > > t/t9500-gitweb-standalone-no-errors.sh. > > Nice! > > > > > Signed-off-by: Ian Kelling > > --- > > > > Notes: > > I wondered if adding highlight to the pipeline would make viewing a blob > > with no highlighting take longer but it did not on my computer. I found > > no noticeable impact on small files and strangely, on a 159k file, it > > took 7% less time averaged over several requests. > > Strange. I would guess that invoking separate binary and perl would > always > add to the time (especially on operation systems where forking / running > command is expensive... though those are not often used with web servers, > isn't it). I dug into this a little more, and I think it's because when we call highlight, we later call sanitize() instead of esc_html(). sanitize() is faster and makes up for the extra time highlight takes. I ran a test on my machine calling sanitize and esc_html on each line of gitweb.perl 100 times: 7.4s for sanitize, 12.4s for esc_html. > > > > > Documentation/gitweb.conf.txt | 21 ++--- > > gitweb/gitweb.perl | 10 +- > > t/t9500-gitweb-standalone-no-errors.sh | 18 +- > > 3 files changed, 32 insertions(+), 17 deletions(-) > > > > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt > > index a79e350..e632089 100644 > > --- a/Documentation/gitweb.conf.txt > > +++ b/Documentation/gitweb.conf.txt > > @@ -246,13 +246,20 @@ $highlight_bin:: > > Note that 'highlight' feature must be set for gitweb to actually > > use syntax highlighting. > > + > > -*NOTE*: if you want to add support for new file type (supported by > > -"highlight" but not used by gitweb), you need to modify `%highlight_ext` > > -or `%highlight_basename`, depending on whether you detect type of file > > -based on extension (for example "sh") or on its basename (for example > > -"Makefile"). The keys of these hashes are extension and basename, > > -respectively, and value for given key is name of syntax to be passed via > > -`--syntax ` to highlighter. > > +*NOTE*: for a file to be highlighted, its syntax type must be detected > > +and that syntax must be supported by "highlight". The default syntax > > +detection is minimal, and there are many supported syntax types with no > > +detection by default. There are three options for adding syntax > > +detection. The first and second priority are `%highlight_basename` and > > +`%highlight_ext`, which detect based on basename (the full filename, for > > +example "Makefile") and extension (for example "sh"). The keys of these > > +hashes are the basename and extension, respectively, and the value for a > > +given key is t
Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages
On Wed, Sep 21, 2016 at 8:28 PM, Jakub Narębski wrote: > W dniu 21.09.2016 o 20:04, Ævar Arnfjörð Bjarmason pisze: >> It would make some code like git_print_log() a bit more complex / >> fragile, since it would have to work on multi-line strings, but >> anything that needed to do a regex match / replacement would be much >> faster. > > Would it? Did you perform any synthetic micro-benchmark? No, just experience. With the caveat that this may not matter at all in this context, C-like code in Perl is slow, if you can offload things to one big regex operation it's usually faster. >> >> But OTOH I think perhaps we're worrying about nothing when it comes to >> the performance. I haven't been able to make gitweb display more than >> a 100 or so commits at a time (haven't found where exactly in the code >> these limits are), any munging we do on the log messages would have to >> be pretty damn slow to matter. > > sub git_log_generic { > > # [...] > > my @commitlist = > parse_commits($commit_hash, 101, (100 * $page), > defined $file_name ? ($file_name, > "--full-history") : ()); > > Here you have it (it probably should be a constant; this number can be > found in a few other places). Thanks!
Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages
W dniu 21.09.2016 o 19:58, Ævar Arnfjörð Bjarmason pisze: > On Wed, Sep 21, 2016 at 7:09 PM, Jakub Narębski wrote: >> W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał: >>> +(?>> +[A-Za-z0-9.-]+ >>> +(?!\.) # refs can't end with ".", see check_refname_format() >> >> If we can assume that tag name is at least two characters (instead of >> at least one character), we could get rid of those extended regexp >> lookaround assertions: >> >> (?> (?!pattern) - zero-width negative lookahead assertion >> >> That is: >> >> +[A-Za-z0-9.] # see strbuf_check_tag_ref(). Tags can't start >> with - >> +[A-Za-z0-9.-]* >> +[A-Za-z0-9-] # refs can't end with ".", see >> check_refname_format() > > Why get rid of them? I'm all for improving the regex, there's bound to > be lots of bugs in it, but since it's perl we can freely use its > extended features. Ah, all right. I was wondering how zero width assertions / patterns interact with each other, but zero-width negative lookaround assertions are really quite simple. > >> Also, the canonical documentation for what is allowed in refnames >> is git-check-ref-format(1)... though it does not look like it includes >> "tags cannot start with '-'". > > Yeah, looks like that manpage needs to be patched. Right. > >> Anyway, perhaps 'is it valid refname' could be passed to a subroutine, >> or a named regexp (which might be more involved, like disallowing two >> consecutive dots, e.g. "(?!.*\.{2})" at beginning). I wonder if rules for valid tag name can be described in extended regexp, and if it is, how readable would it be. Regards, -- Jakub Narębski
Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages
W dniu 21.09.2016 o 20:04, Ævar Arnfjörð Bjarmason pisze: > On Wed, Sep 21, 2016 at 6:26 PM, Jakub Narębski wrote: > >> P.S. I have reworking of commit message parsing and enhancement in my >> long, long and dated gitweb TODO list :-( > > Anything specific you could share? Some of TODO I would have to bring from backups, as the computer on which I did majority of gitweb development has since died (from old age). The list includes: - implement caching of gitweb output - revamp handling of encoding (UTF-8 with fallback encoding) - split gitweb into modules, while maintaining ease of install - refactor handling of diffs - better handling of config files - document URI structure, perhaps revamp URI parsing and generation - make commit message transformation generic (see below) > > One thing that would be a lot faster in Perl is if we didn't have to > pass the log around as split-up lines and could just operate on it as > one big string. Well, there are a few transformations that commit message undergoes in gitweb, including linking SHA1, optional linking of bug numbers to bug tracker, and syntax highlighting of signoff lines (trailer lines). I would like to have this cleaned up, and refactored. With all those transformations we would need to keep account which parts are HTML, and which not and need escaping (note: URI escape != HTML escape). > > It would make some code like git_print_log() a bit more complex / > fragile, since it would have to work on multi-line strings, but > anything that needed to do a regex match / replacement would be much > faster. Would it? Did you perform any synthetic micro-benchmark? > > But OTOH I think perhaps we're worrying about nothing when it comes to > the performance. I haven't been able to make gitweb display more than > a 100 or so commits at a time (haven't found where exactly in the code > these limits are), any munging we do on the log messages would have to > be pretty damn slow to matter. sub git_log_generic { # [...] my @commitlist = parse_commits($commit_hash, 101, (100 * $page), defined $file_name ? ($file_name, "--full-history") : ()); Here you have it (it probably should be a constant; this number can be found in a few other places). Best, -- Jakub Narębski
Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages
On Wed, Sep 21, 2016 at 6:26 PM, Jakub Narębski wrote: > P.S. I have reworking of commit message parsing and enhancement in my > long, long and dated gitweb TODO list :-( Anything specific you could share? One thing that would be a lot faster in Perl is if we didn't have to pass the log around as split-up lines and could just operate on it as one big string. It would make some code like git_print_log() a bit more complex / fragile, since it would have to work on multi-line strings, but anything that needed to do a regex match / replacement would be much faster. But OTOH I think perhaps we're worrying about nothing when it comes to the performance. I haven't been able to make gitweb display more than a 100 or so commits at a time (haven't found where exactly in the code these limits are), any munging we do on the log messages would have to be pretty damn slow to matter. > P.P.S. Kay Sievers no longer works on gitweb, and I think no longer > works at SuSE but at RedHat. Yup, been getting bounces from his address.
Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages
Jakub Narębski writes: >> When I saw 2/3 I wondered about one thing and 3/3 shares the same, >> which is that we only use regex match and do not validate for a >> false match. Would it be too expensive... > > It's a matter of balance between false positives (and unresolving > links) and performance... Yes, and that is why I asked a simple yes-or-no question. Would it be too expensive? Your answer seems to be yes. Have we measured? Is that really a bottleneck? Would it help to update parse_commits to call a new command "gitweb--helper" that produces the result of what git_print_log would have done to its $log argument, for example?
Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages
On Wed, Sep 21, 2016 at 7:09 PM, Jakub Narębski wrote: > W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał: > >> Change the log formatting function to know about "git describe" output >> like v2.8.0-4-g867ad08 in addition to just plain 867ad08. > > All right, that is a good plan. > >> >> This also fixes a micro-regression in my change of the minimum SHA1 >> length from 8 to 7, which is that dated tags like >> hadoop-20160921-113441-20-g094fb7d would start thinking the "20160921" >> part was a commit. > > Actually 20160921 is 8 characters, so assuming that '-' is treated > as word boundary by Perl, it is not a regression; this false positive > was there. The new feature would help, instead of linking false match > it links whole git-describe output. > > So this paragraph needs to be changed wrt. the above. Yeah I just miscounted there, will change that. > Note that there are quite a bit of shortened SHA-1 that are composed > entirely from digits, without a-f characters. *nod* >> >> There are still many valid refnames that we don't link to >> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to >> v2.8.0-4-g867ad08, but I'm not supporting that with this commit, >> similarly it's trivially possible to create some refnames like >> "æ/var-gf6727b0" or whatever which won't be picked up by this regex. > > Hopefully hierarchical tags are rare. We need to reduce false > positives. > >> >> There's surely room for improvement here, but I just wanted to address >> the very common case of sticking "git describe" output into commit >> messages without trying to link to all possible refnames, that's going >> to be a rather futile exercise given that this is free text, and it >> would be prohibitively expensive to look up whether the references in >> question exist in our repository. > > Note that we do not ask Git at the time of displaying commit message > if the link is valid for performance reasons; we link it, and the link > may be invalid if it was a false positive. > > Note that recommended way to refer to other commit in commit mesages > is (see Documentation/SubmittingPatches): > > If you want to reference a previous commit in the history of a stable > branch, use the format "abbreviated sha1 (subject, date)", > with the subject enclosed in a pair of double-quotes, like this: > > Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30) > noticed that ... > > Hmmm... this makes previous commit even more important. > >> --- >> gitweb/gitweb.perl | 18 -- >> 1 file changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index 101dbc0..3a52bc7 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -2036,10 +2036,24 @@ sub format_log_line_html { >> my $line = shift; >> >> $line = esc_html($line, -nbsp=>1); >> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{ >> + $line =~ s{ >> +\b >> +( >> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0 >> +# or hadoop-20160921-113441-20-g094fb7d > > All right, for more complex regular expressions using in-line comments > (extended regexp in Perl) is a good idea. > >> +(?> +[A-Za-z0-9.-]+ >> +(?!\.) # refs can't end with ".", see check_refname_format() > > If we can assume that tag name is at least two characters (instead of > at least one character), we could get rid of those extended regexp > lookaround assertions: > > (? (?!pattern) - zero-width negative lookahead assertion > > That is: > > +[A-Za-z0-9.] # see strbuf_check_tag_ref(). Tags can't start > with - > +[A-Za-z0-9.-]* > +[A-Za-z0-9-] # refs can't end with ".", see > check_refname_format() Why get rid of them? I'm all for improving the regex, there's bound to be lots of bugs in it, but since it's perl we can freely use its extended features. > Also, the canonical documentation for what is allowed in refnames > is git-check-ref-format(1)... though it does not look like it includes > "tags cannot start with '-'". Yeah, looks like that manpage needs to be patched. > Anyway, perhaps 'is it valid refname' could be passed to a subroutine, > or a named regexp (which might be more involved, like disallowing two > consecutive dots, e.g. "(?!.*\.{2})" at
Re: [PATCH] gitweb: use highlight's shebang detection
W dniu 21.09.2016 o 18:38, Junio C Hamano pisze: > Jakub Narębski writes: >> W dniu 06.09.2016 o 21:00, Ian Kelling pisze: >> >>> The highlight binary can detect language by shebang when we can't tell >>> the syntax type by the name of the file. >> >> Was it something always present among highlight[1] binary capabilities, >> or is it something present only in new enough highlight app? Or only >> in some specific fork / specific binary? I couldn't find language >> detection in highlight[1] documentation... >> ... >> Thank you for your work on this patch, > > Thanks for reviewing. It seems that there will be further exchange > needed before I can pick it up? Yes, I think so. -- Jakub Narębski