Re: Upgrade to gitweb-1.8.3.1-20 on centos7 breaks git access

2019-08-17 Thread Gaiseric Vandal
If I remove /etc/httpd/conf.d/git.conf then I just get  blank page in 
the browser.


the ssl_error log shows

[Sat Aug 17 16:17:08.926093 2019] [cgi:error] [pid 18299] [client 
x.x.x.x:25050] AH01215: Request not supported: '/export/gitrepos/ABC/'



On 8/17/2019 4:03 PM, Gaiseric Vandal wrote:

I recently applied the latest patches on centos7, including
gitweb-1.8.3.1-20.  When I try to access git repos (either via web
browser or git command line.) Repos are configured to require LDAP
authentication.


I should be able to access repo ABC via

https://myserver.domain.com/git/ABC

I get the correct user login prompt but then , after logging, get
following error

  The requested URL /git/ABC/ was not found on this server.

I don't think any of the config files got changed.

My partial config is

  # cat /etc/httpd/conf.d/git.conf
  Alias /git /var/www/git

  
   Options +ExecCGI
   AddHandler cgi-script .cgi
   DirectoryIndex gitweb.cgi
  
  #



  # cat /etc/httpd/conf.d/gitrepos.conf
  SetEnv GIT_PROJECT_ROOT /export/gitrepos
  SetEnv GIT_HTTP_EXPORT_ALL
  ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

  RewriteEngine on
  RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
  RewriteCond %{REQUEST_URI} /git-receive-pack$
  RewriteRule ^/git/ - [E=AUTHREQUIRED]

  
  AuthType Basic
  AuthBasicProvider ldap
  AuthLDAPURL "ldaps://xxx)"
  AuthLDAPBindDN  "uid="
  AuthLDAPBindPassword xxx
  AuthName "Git Access"
  Require user x x x
  Order deny,allow
  Deny from env=AUTHREQUIRED
  Satisfy any
  


  
  AuthType Basic
  AuthBasicProvider ldap
  AuthLDAPURL "ldaps://xxx)"
  AuthLDAPBindDN  "uid="
  AuthLDAPBindPassword xxx
  AuthName "Git Access ABC"
  Require user x x x
  Order deny,allow




The /var/www/git/gitweb.cgi file has the default settings.

...
our $GIT = "/usr/bin/git";

# absolute fs-path which will be prepended to the project path
our $projectroot = "/var/lib/git";
#our $projectroot = "/export/gitrepos";

...


If I set $projectroot to point to /export/gitrepos, I can see a list
of projects when I browse to https://myserver.domain.com/git-
however I wanted to minimize how much information was exposed with no
authentication.  And it didn't fix the access problem.
The "git" command is in the default OS path.   changing
   our $GIT = "/usr/bin/git";
to
  our $GIT = "git";

did not help.


The only thing in the httpd error logs that looks relevant is

[Sat Aug 17 15:39:39.826265 2019] [core:info] [pid 15870] [client
x] AH00128: File does not exist: /var/www/git/ABC/
[Sat Aug 17 15:39:44.831598 2019] [ssl:info] [pid 15870] (70007)The
timeout specified has expired: [client x] AH01991: SSL input
filter read failed.


selinux is disabled.

Since I am getting an authentication prompt with "Git Access ABC" in
the dialog box it seems clear that the project root is being picked up
from  /etc/httpd/conf.d/gitrepos.conf and that the
/usr/libexec/git-core/git-http-backend script is being run.


Appreciate any advice.

Thanks


Upgrade to gitweb-1.8.3.1-20 on centos7 breaks git access

2019-08-17 Thread Gaiseric Vandal
I recently applied the latest patches on centos7, including
gitweb-1.8.3.1-20.  When I try to access git repos (either via web
browser or git command line.) Repos are configured to require LDAP
authentication.


I should be able to access repo ABC via

https://myserver.domain.com/git/ABC

I get the correct user login prompt but then , after logging, get
following error

 The requested URL /git/ABC/ was not found on this server.

I don't think any of the config files got changed.

My partial config is

 # cat /etc/httpd/conf.d/git.conf
 Alias /git /var/www/git

 
  Options +ExecCGI
  AddHandler cgi-script .cgi
  DirectoryIndex gitweb.cgi
 
 #



 # cat /etc/httpd/conf.d/gitrepos.conf
 SetEnv GIT_PROJECT_ROOT /export/gitrepos
 SetEnv GIT_HTTP_EXPORT_ALL
 ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/

 RewriteEngine on
 RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
 RewriteCond %{REQUEST_URI} /git-receive-pack$
 RewriteRule ^/git/ - [E=AUTHREQUIRED]

 
 AuthType Basic
 AuthBasicProvider ldap
 AuthLDAPURL "ldaps://xxx)"
 AuthLDAPBindDN  "uid="
 AuthLDAPBindPassword xxx
 AuthName "Git Access"
 Require user x x x
 Order deny,allow
 Deny from env=AUTHREQUIRED
 Satisfy any
 


 
 AuthType Basic
 AuthBasicProvider ldap
 AuthLDAPURL "ldaps://xxx)"
 AuthLDAPBindDN  "uid="
 AuthLDAPBindPassword xxx
 AuthName "Git Access ABC"
 Require user x x x
 Order deny,allow




The /var/www/git/gitweb.cgi file has the default settings.

...
our $GIT = "/usr/bin/git";

# absolute fs-path which will be prepended to the project path
our $projectroot = "/var/lib/git";
#our $projectroot = "/export/gitrepos";

...


If I set $projectroot to point to /export/gitrepos, I can see a list
of projects when I browse to https://myserver.domain.com/git-
however I wanted to minimize how much information was exposed with no
authentication.  And it didn't fix the access problem.
The "git" command is in the default OS path.   changing
  our $GIT = "/usr/bin/git";
to
 our $GIT = "git";

did not help.


The only thing in the httpd error logs that looks relevant is

[Sat Aug 17 15:39:39.826265 2019] [core:info] [pid 15870] [client
x] AH00128: File does not exist: /var/www/git/ABC/
[Sat Aug 17 15:39:44.831598 2019] [ssl:info] [pid 15870] (70007)The
timeout specified has expired: [client x] AH01991: SSL input
filter read failed.


selinux is disabled.

Since I am getting an authentication prompt with "Git Access ABC" in
the dialog box it seems clear that the project root is being picked up
from  /etc/httpd/conf.d/gitrepos.conf and that the
/usr/libexec/git-core/git-http-backend script is being run.


Appreciate any advice.

Thanks


Re: How to get gitweb setup with nginx + uwsgi?

2019-08-12 Thread koocr
Hi,

On Mon, Aug 12, 2019, at 12:16 PM, Konstantin Ryabitsev wrote:
> We have it working in a similar configuration, but with CGit instead of 
> GitWeb. Unless you have specific requirements to run GitWeb, I recommend 
> you use CGit for your web frontend, as it offers many features GitWeb 
> doesn't, such as caching -- plus it works really well with uwsgi 
> (git.kernel.org runs in that configuration).


Didn't realize that!  

I'd picked gitweb in the first place as I figured, being part of git, that 
kernel project would be using it.

Clearly I'm wrong.

Switching to CGit now.  Thanks for the suggestion.

I don't use Arch (Opensuse User), but am trying to follow their CGit + UWSGI + 
Gitolite docs:

https://wiki.archlinux.org/index.php/Cgit#Using_uwsgi

Since my distro pkgs have different paths, systemd setup, etc. having some 
challenges getting a everything-working-together config setup.  That'll just 
take some reading and staring ! ;-)

If you know of any other good examples , please share :-)

TA!


Re: How to get gitweb setup with nginx + uwsgi?

2019-08-12 Thread Konstantin Ryabitsev

On Mon, Aug 12, 2019 at 08:09:22AM -0700, ko...@mailc.net wrote:

Hi all.

I'm setting up a local Git server, with Gitweb + Gitolite.


We have it working in a similar configuration, but with CGit instead of 
GitWeb. Unless you have specific requirements to run GitWeb, I recommend 
you use CGit for your web frontend, as it offers many features GitWeb 
doesn't, such as caching -- plus it works really well with uwsgi 
(git.kernel.org runs in that configuration).


Best,
-K



How to get gitweb setup with nginx + uwsgi?

2019-08-12 Thread koocr
Hi all.

I'm setting up a local Git server, with Gitweb + Gitolite.

The gitolite wrapper is installed & working.  Now I'm working on the Gitweb 
frontend.

I run Nginx as my webserver.  Usually with PHP, using fpm.

Gitweb's gitweb.cgi looks like it needs perl CGI.

For perl cgi I'm trying to get it working with UWSGI,

https://uwsgi-docs.readthedocs.io/en/latest/Nginx.html
https://nginx.org/en/docs/http/ngx_http_uwsgi_module.html#example

I installed

git --version
git version 2.22.0
    ls -al /usr/share/gitweb/gitweb.cgi
-rwxr-xr-x 1 root root 247K Jul 24 05:27 
/usr/share/gitweb/gitweb.cgi
grep "\$version =" /usr/share/gitweb/gitweb.cgi
our $version = "2.22.0";
nginx -v
nginx version: nginx/1.17.1 
uwsgi --version
2.0.18

I set up the nginx vhost

server { 
listen 127.0.0.1:60080 http2;
root /usr/share/gitweb;
index  gitweb.cgi;

location / {
try_files $uri $uri/ @gitweb;
}
location @gitweb {
root /usr/share/gitweb;
include uwsgi_params;
gzip off;
    uwsgi_param UWSGI_SCRIPT gitweb;
        uwsgi_param GITWEB_CONFIG /etc/gitweb/gitweb.conf;
uwsgi_pass unix:/run/uwsgi/uwsgi.sock;
uwsgi_modifier1 5;
}
}

and the uwsgi server

/etc/uwsgi/uwsgi.ini
[uwsgi]
strict = 1

master = true
processes = 2

binary-path = /usr/sbin/uwsgi
plugin-dir = /usr/lib64/uwsgi
logto = /var/log/uwsgi/uwsgi.log

uid = wwwrun
gid = www
umask = 022

uwsgi-socket = /run/uwsgi/uwsgi.sock
chmod-socket = 660
chown-socket = wwwrun:www

plugins = http,psgi
chdir = /usr/share/gitweb
psgi = gitweb.cgi


nginx & uwsgi services are both running

ps aux | egrep "nginx|uwsgi"
wwwrun   17463  0.0  0.1  89468 23704 ?Ss   07:03   
0:00 /usr/sbin/uwsgi --autoload --ini /etc/uwsgi/uwsgi.ini
wwwrun   17465  0.0  0.1  97664 17184 ?Sl   07:03   
0:00 /usr/sbin/uwsgi --autoload --ini /etc/uwsgi/uwsgi.ini
wwwrun   17468  0.0  0.1  97664 17184 ?Sl   07:03   
0:00 /usr/sbin/uwsgi --autoload --ini /etc/uwsgi/uwsgi.ini
root 18006  0.0  0.0 211264  4276 ?Ss   07:10   
0:00 nginx: master process /opt/nginx/sbin/nginx -c /etc/nginx/nginx.conf -g 
pid /run/nginx.pid;
wwwrun   18007  0.0  0.0 211416  5492 ?S07:10   
0:00 nginx: worker process
wwwrun   18008  0.0  0.0 212068 10300 ?S07:10   
0:00 nginx: worker process
wwwrun   18009  0.0  0.0 211416  5492 ?S07:10   
0:00 nginx: worker process
wwwrun   18011  0.0  0.0 211416  5492 ?S07:10   
0:00 nginx: worker process
wwwrun   18012  0.0  0.0 211452  5052 ?S07:10   
0:00 nginx: cache manager process

ls -al /run/uwsgi/uwsgi.sock
srw-rw 1 wwwrun www  0 Aug 12 07:03 /run/uwsgi/uwsgi.sock=

when I go to the site

http://127.0.0.1:60080/

I just get the script listing in the browser

#!/usr/bin/perl
 
# gitweb - simple web interface to track changes in git repositories
#
# (C) 2005-2006, Kay Sievers 
# (C) 2005, Christian Gierke
#
# This program is licensed under the GPLv2
 
use 5.008;
use strict;
use warnings;
...

no errors anywhere, just the script display.

I'm missing something basic since it's not running the script. :-/

Anyone have any experience with gitweb + uwsgi on nginx?  Or know a good 
working example?

Thanks!




[PATCH] doc: don't use git.kernel.org as example gitweb URL

2019-06-22 Thread Jakub Wilk
git.kernel.org uses cgit, not gitweb, these days:

$ w3m -dump 'http://git.kernel.org/?p=git/git.git;a=tree;f=gitweb' | grep 
-w generated
generated by cgit 1.2-0.3.lf.el7 (git 2.18.0) at 2019-06-22 16:14:38 +

Signed-off-by: Jakub Wilk 
---
 Documentation/gitweb.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/gitweb.txt b/Documentation/gitweb.txt
index c7436098c9..3cc9b034c4 100644
--- a/Documentation/gitweb.txt
+++ b/Documentation/gitweb.txt
@@ -28,8 +28,7 @@ Gitweb provides a web interface to Git repositories.  Its 
features include:
   revisions one at a time, viewing the history of the repository.
 * Finding commits which commit messages matches given search term.
 
-See http://git.kernel.org/?p=git/git.git;a=tree;f=gitweb[] or
-http://repo.or.cz/w/git.git/tree/HEAD:/gitweb/[] for gitweb source code,
+See http://repo.or.cz/w/git.git/tree/HEAD:/gitweb/[] for gitweb source code,
 browsed using gitweb itself.
 
 
-- 
2.20.1



Re: on fedora, "man gitweb" exists but actual gitweb command is missing

2019-02-23 Thread Robert P. J. Day
On Sat, 23 Feb 2019, Todd Zullinger wrote:

> Hi,
>
> Robert P. J. Day wrote:
> >
> >   not so much a git issue as what looks like a fedora packaging
> >   issue.
>
> Yeah, it's just a minor packaging issue.  The gitweb manpages are
> included in the main git package rather than in the gitweb package
> with the rest of the gitweb files.  I'll fix that for future
> releases and when f29 is updated to 2.21 it will pick that up¹.

  ... snip ...

  ah, eggcellent, so i can remove this from my TODO list, thanks.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday


Re: on fedora, "man gitweb" exists but actual gitweb command is missing

2019-02-23 Thread Todd Zullinger
Hi,

Robert P. J. Day wrote:
> 
>   not so much a git issue as what looks like a fedora packaging issue.

Yeah, it's just a minor packaging issue.  The gitweb
manpages are included in the main git package rather than in
the gitweb package with the rest of the gitweb files.  I'll
fix that for future releases and when f29 is updated to 2.21
it will pick that up¹.

Since gitweb requires git, you'd be sure to have the
documentation after installing gitweb.  If we made it
possible to install gitweb without getting the
documentation, that would be more annoying. :)

¹ https://src.fedoraproject.org/fork/tmz/rpms/git/c/0d9ad786

> it took only a few seconds to determine that fedora bundles that
> functionality in two separate packages which are not dependencies of
> "git": "gitweb" and "git-instaweb" (output abbreviated):
> 
>   $ sudo dnf install git-instaweb
>   ...
>   Installing:
>git-instaweb
>   Installing dependencies:
>gitweb
>perl-CGI
> 
> and now "git-instaweb" works fine.
> 
>   the question is, is it not inconsistent for fedora's basic "git"
> package to include the man page for gitweb, without including the
> corresponding functionality? is this something i should submit a
> fedora bugzilla report for? or am i misunderstanding something?

If I hadn't seen this thread, then a report in the fedora
bugzilla would be the way to go to ensure one of the fedora
git maintainers noticed it.

Thanks,

-- 
Todd


on fedora, "man gitweb" exists but actual gitweb command is missing

2019-02-23 Thread Robert P. J. Day


  not so much a git issue as what looks like a fedora packaging issue.
on my updated fedora 29 system, "man gitweb" works:

  GITWEB(1)Git Manual    GITWEB(1)

  NAME
   gitweb - Git web interface (web frontend to Git repositories)

  SYNOPSIS
   To get started with gitweb, run git-instaweb(1) from a Git
   repository. This would configure and start your web server, and
   run web browser pointing to gitweb.
   ... snip ...

and the corresponding man page source file clearly belongs to the
"git" package:

  $ rpm -qf /usr/share/man/man1/gitweb.1.gz
  git-2.20.1-1.fc29.x86_64
  $

but there is no "git instaweb" command:

  $ git instaweb
  git: 'instaweb' is not a git command. See 'git --help'.
  $

it took only a few seconds to determine that fedora bundles that
functionality in two separate packages which are not dependencies of
"git": "gitweb" and "git-instaweb" (output abbreviated):

  $ sudo dnf install git-instaweb
  ...
  Installing:
   git-instaweb
  Installing dependencies:
   gitweb
   perl-CGI

and now "git-instaweb" works fine.

  the question is, is it not inconsistent for fedora's basic "git"
package to include the man page for gitweb, without including the
corresponding functionality? is this something i should submit a
fedora bugzilla report for? or am i misunderstanding something?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



[PATCH v2 35/35] gitweb: make hash size independent

2019-02-18 Thread brian m. carlson
Gitweb has several hard-coded 40 values throughout it to check for
values that are passed in or acquired from Git.  To simplify the code,
introduce a regex variable that matches either exactly 40 or exactly 64
hex characters, and use this variable anywhere we would have previously
hard-coded a 40 in a regex.

Add some helper functions which allow us to write tighter regexes that
match exactly the number of hex characters we're expecting.

Similarly, switch the code that looks for deleted diffinfo information
to look for either 40 or 64 zeros, and update one piece of code to use
this function.  Finally, when formatting a log line, allow an
abbreviated describe output to contain up to 64 characters.

Helped-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: brian m. carlson 
---
 gitweb/gitweb.perl | 97 --
 1 file changed, 67 insertions(+), 30 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2594a4badb..7fef19fe59 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -788,6 +788,38 @@ sub check_loadavg {
 # ==
 # input validation and dispatch
 
+# Various hash size-related values.
+my $sha1_len = 40;
+my $sha256_extra_len = 24;
+my $sha256_len = $sha1_len + $sha256_extra_len;
+
+# A regex matching $len hex characters. $len may be a range (e.g. 7,64).
+sub oid_nlen_regex {
+   my $len = shift;
+   my $hchr = qr/[0-9a-fA-F]/;
+   return qr/(?:(?:$hchr){$len})/;
+}
+
+# A regex matching two sets of $nlen hex characters, prefixed by the literal
+# string $prefix and with the literal string $infix between them.
+sub oid_nlen_prefix_infix_regex {
+   my $nlen = shift;
+   my $prefix = shift;
+   my $infix = shift;
+
+   my $rx = oid_nlen_regex($nlen);
+
+   return qr/^\Q$prefix\E$rx\Q$infix\E$rx$/;
+}
+
+# A regex matching a valid object ID.
+our $oid_regex;
+{
+   my $x = oid_nlen_regex($sha1_len);
+   my $y = oid_nlen_regex($sha256_extra_len);
+   $oid_regex = qr/(?:$x(?:$y)?)/;
+}
+
 # input parameters can be collected from a variety of sources (presently, CGI
 # and PATH_INFO), so we define an %input_params hash that collects them all
 # together during validation: this allows subsequent uses (e.g. href()) to be
@@ -1516,7 +1548,7 @@ sub is_valid_refname {
 
return undef unless defined $input;
# textual hashes are O.K.
-   if ($input =~ m/^[0-9a-fA-F]{40}$/) {
+   if ($input =~ m/^$oid_regex$/) {
return 1;
}
# it must be correct pathname
@@ -2028,6 +2060,9 @@ sub file_type_long {
 sub format_log_line_html {
my $line = shift;
 
+   # Potentially abbreviated OID.
+   my $regex = oid_nlen_regex("7,64");
+
$line = esc_html($line, -nbsp=>1);
$line =~ s{
 \b
@@ -2037,10 +2072,10 @@ sub format_log_line_html {
 (?';
}
# match 
-   if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) {
+   if ($line =~ oid_nlen_prefix_infix_regex($sha1_len, "index ", ",") |
+   $line =~ oid_nlen_prefix_infix_regex($sha256_len, "index ", ",")) {
# can match only for combined diff
$line = 'index ';
for (my $i = 0; $i < $diffinfo->{'nparents'}; $i++) {
@@ -2308,7 +2344,8 @@ sub format_extended_diff_header_line {
$line .= '0' x 7;
}
 
-   } elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) {
+   } elsif ($line =~ oid_nlen_prefix_infix_regex($sha1_len, "index ", 
"..") |
+$line =~ oid_nlen_prefix_infix_regex($sha256_len, "index ", 
"..")) {
# can match only for ordinary diff
my ($from_link, $to_link);
if ($from->{'href'}) {
@@ -2834,7 +2871,7 @@ sub git_get_hash_by_path {
}
 
#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa  panic.c'
-   $line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/;
+   $line =~ m/^([0-9]+) (.+) ($oid_regex)\t/;
if (defined $type && $type ne $2) {
# type doesn't match
return undef;
@@ -,7 +3370,7 @@ sub git_get_references {
 
while (my $line = <$fd>) {
chomp $line;
-   if ($line =~ m!^([0-9a-fA-F]{40})\srefs/($type.*)$!) {
+   if ($line =~ m!^($oid_regex)\srefs/($type.*)$!) {
if (defined $refs{$1}) {
push @{$refs{$1}}, $2;
} else {
@@ -3407,7 +3444,7 @@ sub parse_tag {
$tag{'id'} = $tag_id;
while (my $line = <$fd>) {
chomp $line;
-   if ($line =~ m/^object ([0-9a-fA-F]{40})$/) {
+   if (

Re: [PATCH 31/31] gitweb: make hash size independent

2019-02-12 Thread Ævar Arnfjörð Bjarmason


On Tue, Feb 12 2019, brian m. carlson wrote:

> Gitweb has several hard-coded 40 values throughout it to check for
> values that are passed in or acquired from Git.  To simplify the code,
> introduce a regex variable that matches either exactly 40 or exactly 64
> hex characters, and use this variable anywhere we would have previously
> hard-coded a 40 in a regex.
>
> Similarly, switch the code that looks for deleted diffinfo information
> to look for either 40 or 64 zeros, and update one piece of code to use
> this function.  Finally, when formatting a log line, allow an
> abbreviated describe output to contain up to 64 characters.

This might be going a bit overboard but I tried this with a variant
where...

> +# A regex matching a valid object ID.
> +our $oid_regex = qr/(?:[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})?)/;
> +

Instead of this dense regex I did:

my $sha1_len = 40;
my $sha256_extra_len = 24;
my $sha256_len = $sha1_len + $sha256_extra_len;

sub oid_nlen_regex {
my $len = shift;
my $hchr = qr/[0-9a-fA-F]/;
return qr/(?:(?:$hchr){$len})/
}

our $oid_regex;
{
my $x = oid_nlen_regex($sha1_len);
my $y = oid_nlen_regex($sha256_extra_len);
$oid_regex = qr/(?:$x(?:$y)?)/
}

Then most of the rest of this is the same, e.g.:
> - if ($input =~ m/^[0-9a-fA-F]{40}$/) {

But...

> @@ -2037,10 +2040,10 @@ sub format_log_line_html {
>  (?  [A-Za-z0-9.-]+
>  (?!\.) # refs can't end with ".", see check_refname_format()
> --g[0-9a-fA-F]{7,40}
> +-g[0-9a-fA-F]{7,64}
>  |
>  # Just a normal looking Git SHA1
> -[0-9a-fA-F]{7,40}
> +[0-9a-fA-F]{7,64}
>  )
>  \b
>  }{

E.g. here we can do call oid_nlen_regex("7,64") to produce this blurb.

> - if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) {
> + if ($line =~ m/^index $oid_regex,$oid_regex/) {
> - } elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) {
> + } elsif ($line =~ m/^index $oid_regex..$oid_regex/) {

And here, maybe nobody cares, but we now implicitly accept mixed SHA-1 &
SHA-256 input. Whereas we could have a helper on top of the above code
like:

sub oid_nlen_prefix_infix_regex {
my $nlen = shift;
my $prefix = shift;
my $infix = shift;

my $rx = oid_nlen_regex($nlen);

return qr/^\Q$prefix\E$rx\Q$infix\E$rx$/;
}

And then e.g.:

} elsif ($line =~ oid_nlen_prefix_infix_regex($sha1_len, "index ", "..") ||
 $line =~ oid_nlen_prefix_infix_regex($sha256_len, "index ", "..")) 
{

So only accept SHA1..SHA1 or SHA256..SHA256, not SHA1..SHA256 or
SHA256..SHA1.


[PATCH 31/31] gitweb: make hash size independent

2019-02-11 Thread brian m. carlson
Gitweb has several hard-coded 40 values throughout it to check for
values that are passed in or acquired from Git.  To simplify the code,
introduce a regex variable that matches either exactly 40 or exactly 64
hex characters, and use this variable anywhere we would have previously
hard-coded a 40 in a regex.

Similarly, switch the code that looks for deleted diffinfo information
to look for either 40 or 64 zeros, and update one piece of code to use
this function.  Finally, when formatting a log line, allow an
abbreviated describe output to contain up to 64 characters.

Signed-off-by: brian m. carlson 
---
 gitweb/gitweb.perl | 63 --
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2594a4badb..aec8ca3256 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -788,6 +788,9 @@ sub check_loadavg {
 # ==
 # input validation and dispatch
 
+# A regex matching a valid object ID.
+our $oid_regex = qr/(?:[0-9a-fA-F]{40}(?:[0-9a-fA-F]{24})?)/;
+
 # input parameters can be collected from a variety of sources (presently, CGI
 # and PATH_INFO), so we define an %input_params hash that collects them all
 # together during validation: this allows subsequent uses (e.g. href()) to be
@@ -1516,7 +1519,7 @@ sub is_valid_refname {
 
return undef unless defined $input;
# textual hashes are O.K.
-   if ($input =~ m/^[0-9a-fA-F]{40}$/) {
+   if ($input =~ m/^$oid_regex$/) {
return 1;
}
# it must be correct pathname
@@ -2037,10 +2040,10 @@ sub format_log_line_html {
 (?';
}
# match 
-   if ($line =~ m/^index [0-9a-fA-F]{40},[0-9a-fA-F]{40}/) {
+   if ($line =~ m/^index $oid_regex,$oid_regex/) {
# can match only for combined diff
$line = 'index ';
for (my $i = 0; $i < $diffinfo->{'nparents'}; $i++) {
@@ -2308,7 +2311,7 @@ sub format_extended_diff_header_line {
$line .= '0' x 7;
}
 
-   } elsif ($line =~ m/^index [0-9a-fA-F]{40}..[0-9a-fA-F]{40}/) {
+   } elsif ($line =~ m/^index $oid_regex..$oid_regex/) {
# can match only for ordinary diff
my ($from_link, $to_link);
if ($from->{'href'}) {
@@ -2834,7 +2837,7 @@ sub git_get_hash_by_path {
}
 
#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa  panic.c'
-   $line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/;
+   $line =~ m/^([0-9]+) (.+) ($oid_regex)\t/;
if (defined $type && $type ne $2) {
# type doesn't match
return undef;
@@ -,7 +3336,7 @@ sub git_get_references {
 
while (my $line = <$fd>) {
chomp $line;
-   if ($line =~ m!^([0-9a-fA-F]{40})\srefs/($type.*)$!) {
+   if ($line =~ m!^($oid_regex)\srefs/($type.*)$!) {
if (defined $refs{$1}) {
push @{$refs{$1}}, $2;
} else {
@@ -3407,7 +3410,7 @@ sub parse_tag {
$tag{'id'} = $tag_id;
while (my $line = <$fd>) {
chomp $line;
-   if ($line =~ m/^object ([0-9a-fA-F]{40})$/) {
+   if ($line =~ m/^object ($oid_regex)$/) {
$tag{'object'} = $1;
} elsif ($line =~ m/^type (.+)$/) {
$tag{'type'} = $1;
@@ -3451,15 +3454,15 @@ sub parse_commit_text {
}
 
my $header = shift @commit_lines;
-   if ($header !~ m/^[0-9a-fA-F]{40}/) {
+   if ($header !~ m/^$oid_regex/) {
return;
}
($co{'id'}, my @parents) = split ' ', $header;
while (my $line = shift @commit_lines) {
last if $line eq "\n";
-   if ($line =~ m/^tree ([0-9a-fA-F]{40})$/) {
+   if ($line =~ m/^tree ($oid_regex)$/) {
$co{'tree'} = $1;
-   } elsif ((!defined $withparents) && ($line =~ m/^parent 
([0-9a-fA-F]{40})$/)) {
+   } elsif ((!defined $withparents) && ($line =~ m/^parent 
($oid_regex)$/)) {
push @parents, $1;
} elsif ($line =~ m/^author (.*) ([0-9]+) (.*)$/) {
$co{'author'} = to_utf8($1);
@@ -3591,7 +3594,7 @@ sub parse_difftree_raw_line {
 
# ':100644 100644 03b218260e99b78c6df0ed378e59ed9205ccc96d 
3b93d5e7cc7f7dd4ebed13a5cc1a4ad976fc94d8 M   ls-files.c'
# ':100644 100644 7f9281985086971d3877aca27704f2aaf9c448ce 
bc190ebc71bbd923f2b728e505408f5e54bd073a M   rev-tree.c'
-   if ($line =~ m/^:([0-7]{6}) ([0-7]{6}) ([0-9a-fA-F]{40}) 
(

[PATCH] gitweb: correctly store previous rev in javascript-actions mode

2018-12-16 Thread Jonathan Nieder
From: Robert Luberda 
Date: Sun, 16 Mar 2014 22:57:19 +0100

Without this change, the setting

 $feature{'javascript-actions'}{'default'} = [1];

in gitweb.conf breaks gitweb's blame page: clicking on line numbers
displayed in the second column on the page has no effect.

For comparison, with javascript-actions disabled, clicking on line
numbers loads the previous version of the line.

Addresses https://bugs.debian.org/741883.

Signed-off-by: Jonathan Nieder 
---
Hi Robert,

Years ago, you sent this obviously correct patch to the link mentioned
above, but it got lost in the noise.  Sorry about that.  Hopefully
late is better than never.

May we forge your sign-off?  See
https://www.kernel.org/pub/software/scm/git/docs/SubmittingPatches.html#sign-off
for more details about what this means.

Jakub et al, any thoughts?  I don't see any unit tests in gitweb/static
that could avoid this regressing --- am I missing some, or if not any
hints for someone who would want to add a test framework?

Thanks,
Jonathan

 gitweb/static/js/blame_incremental.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/static/js/blame_incremental.js 
b/gitweb/static/js/blame_incremental.js
index db6eb50584..e100d8206b 100644
--- a/gitweb/static/js/blame_incremental.js
+++ b/gitweb/static/js/blame_incremental.js
@@ -484,7 +484,7 @@ function processBlameLines(lines) {
case 'previous':
curCommit.nprevious++;
// store only first 'previous' header
-   if (!'previous' in curCommit) {
+   if (!('previous' in curCommit)) {
var parts = data.split(' ', 2);
curCommit.previous= parts[0];
curCommit.file_parent = 
unquote(parts[1]);
-- 
2.20.0.405.gbc1bbc6f85



Re: gitweb: local configuration not found

2018-12-06 Thread Martin Mareš
Hello!

> Yeah, it does look indirect.  Despite what you said, it also would
> support users giving an absolute path via GITWEB_CONFIG.
> 
> With "use File::Spec", perhaps something like this?

Yes, this looks right.

Martin


Re: gitweb: local configuration not found

2018-12-05 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>> Documentation says "If you are absolutely certain that you want your
>> script to load and execute a file from the current directory, then use
>> a ./ prefix".  We can do that, like so:
>>
>> diff --git i/gitweb/Makefile w/gitweb/Makefile
>> index cd194d057f..3160b6cc5d 100644
>> --- i/gitweb/Makefile
>> +++ w/gitweb/Makefile
>> @@ -18,7 +18,7 @@ RM ?= rm -f
>>  INSTALL ?= install
>>
>>  # default configuration for gitweb
>> -GITWEB_CONFIG = gitweb_config.perl
>> +GITWEB_CONFIG = ./gitweb_config.perl
>>  GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
>>  GITWEB_CONFIG_COMMON = /etc/gitweb-common.conf
>>  GITWEB_HOME_LINK_STR = projects
>>
>> but that does not help if someone overrides GITWEB_CONFIG, and besides,
>> it would be nicer to avoid the possibility of an @INC search altogether.
>> ...
> Just:
>
> local @INC = '.';
> do 'FILE.pl';
>
> Would do the same thing, but seems like a more indirect way to do it if
> all we want is ./ anyway.

Yeah, it does look indirect.  Despite what you said, it also would
support users giving an absolute path via GITWEB_CONFIG.

With "use File::Spec", perhaps something like this?

 gitweb/gitweb.perl | 4 
 1 file changed, 4 insertions(+)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2594a4badb..239e7cbc25 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -719,6 +719,10 @@ sub filter_and_validate_refs {
 sub read_config_file {
my $filename = shift;
return unless defined $filename;
+
+   $filename = File::Spec->catfile(".", $filename)
+   unless File::Spec->file_name_is_absolute($filename);
+
# die if there are errors parsing config file
if (-e $filename) {
do $filename;


Re: gitweb: local configuration not found

2018-12-05 Thread Ævar Arnfjörð Bjarmason


On Wed, Dec 05 2018, Jonathan Nieder wrote:

> Martin Mares wrote[1]:
>
>> After upgrade to Stretch, gitweb no longer finds the configuration file
>> "gitweb_config.perl" in the current directory. However, "man gitweb" still
>> mentions this as one of the possible locations of the config file (and
>> indeed a useful one when using multiple instances of gitweb).
>>
>> It was probably broken by Perl dropping "." from the default search path
>> for security reasons.
>
> Indeed, perldelta(1) tells me that in 5.24.1 (and 5.26, etc),
>
>   Core modules and tools no longer search "." for optional modules
>
> gitweb.perl contains
>
>  sub read_config_file {
>  my $filename = shift;
>  return unless defined $filename;
>  # die if there are errors parsing config file
>  if (-e $filename) {
>  do $filename;
>
> which implies an @INC search but it is silly --- as the "-e" test
> illustrates, this never intended to search @INC.
>
> Documentation says "If you are absolutely certain that you want your
> script to load and execute a file from the current directory, then use
> a ./ prefix".  We can do that, like so:
>
> diff --git i/gitweb/Makefile w/gitweb/Makefile
> index cd194d057f..3160b6cc5d 100644
> --- i/gitweb/Makefile
> +++ w/gitweb/Makefile
> @@ -18,7 +18,7 @@ RM ?= rm -f
>  INSTALL ?= install
>
>  # default configuration for gitweb
> -GITWEB_CONFIG = gitweb_config.perl
> +GITWEB_CONFIG = ./gitweb_config.perl
>  GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
>  GITWEB_CONFIG_COMMON = /etc/gitweb-common.conf
>  GITWEB_HOME_LINK_STR = projects
>
> but that does not help if someone overrides GITWEB_CONFIG, and besides,
> it would be nicer to avoid the possibility of an @INC search altogether.
> Another alternative would be to use
>
>   local @INC = ('.');
>
> Would that be better?
>
> Advice from someone more versed than I am in perl would be very welcome
> (hence the cc to Ævar).

It seems most sensible to follow the ./FILE.pl advice per
https://metacpan.org/pod/distribution/perl/pod/perl5260delta.pod#Removal-of-the-current-directory-(%22.%22)-from-@INC

Just:

local @INC = '.';
do 'FILE.pl';

Would do the same thing, but seems like a more indirect way to do it if
all we want is ./ anyway. FWIW to be pedantically bug-compatible with
the old version (we should not do this) it's:

local @INC = (@INC, ".");
do 'FILE.pl';

I.e. before our behavior was implicitly to check whether we had a local
FILE.pl, then loop through all of @INC to see if we found it there, and
finally come back to the file we did the -e check for.


gitweb: local configuration not found

2018-12-05 Thread Jonathan Nieder
Martin Mares wrote[1]:

> After upgrade to Stretch, gitweb no longer finds the configuration file
> "gitweb_config.perl" in the current directory. However, "man gitweb" still
> mentions this as one of the possible locations of the config file (and
> indeed a useful one when using multiple instances of gitweb).
>
> It was probably broken by Perl dropping "." from the default search path
> for security reasons.

Indeed, perldelta(1) tells me that in 5.24.1 (and 5.26, etc),

  Core modules and tools no longer search "." for optional modules

gitweb.perl contains

 sub read_config_file {
 my $filename = shift;
 return unless defined $filename;
 # die if there are errors parsing config file
 if (-e $filename) {
 do $filename;

which implies an @INC search but it is silly --- as the "-e" test
illustrates, this never intended to search @INC.

Documentation says "If you are absolutely certain that you want your
script to load and execute a file from the current directory, then use
a ./ prefix".  We can do that, like so:

diff --git i/gitweb/Makefile w/gitweb/Makefile
index cd194d057f..3160b6cc5d 100644
--- i/gitweb/Makefile
+++ w/gitweb/Makefile
@@ -18,7 +18,7 @@ RM ?= rm -f
 INSTALL ?= install
 
 # default configuration for gitweb
-GITWEB_CONFIG = gitweb_config.perl
+GITWEB_CONFIG = ./gitweb_config.perl
 GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
 GITWEB_CONFIG_COMMON = /etc/gitweb-common.conf
 GITWEB_HOME_LINK_STR = projects

but that does not help if someone overrides GITWEB_CONFIG, and besides,
it would be nicer to avoid the possibility of an @INC search altogether.
Another alternative would be to use

local @INC = ('.');

Would that be better?

Advice from someone more versed than I am in perl would be very welcome
(hence the cc to Ævar).

Thanks for reporting and hope that helps,
Jonathan

[1] https://bugs.debian.org/915632


[PATCH 28/78] config.txt: move gitweb.* to a separate file

2018-10-26 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt| 17 +
 Documentation/config/gitweb.txt | 16 
 2 files changed, 17 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/config/gitweb.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 6f4a66b3f2..99ac8fc8ec 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -337,22 +337,7 @@ include::config/gc.txt[]
 
 include::config/gitcvs.txt[]
 
-gitweb.category::
-gitweb.description::
-gitweb.owner::
-gitweb.url::
-   See linkgit:gitweb[1] for description.
-
-gitweb.avatar::
-gitweb.blame::
-gitweb.grep::
-gitweb.highlight::
-gitweb.patches::
-gitweb.pickaxe::
-gitweb.remote_heads::
-gitweb.showSizes::
-gitweb.snapshot::
-   See linkgit:gitweb.conf[5] for description.
+include::config/gitweb.txt[]
 
 grep.lineNumber::
If set to true, enable `-n` option by default.
diff --git a/Documentation/config/gitweb.txt b/Documentation/config/gitweb.txt
new file mode 100644
index 00..1b51475108
--- /dev/null
+++ b/Documentation/config/gitweb.txt
@@ -0,0 +1,16 @@
+gitweb.category::
+gitweb.description::
+gitweb.owner::
+gitweb.url::
+   See linkgit:gitweb[1] for description.
+
+gitweb.avatar::
+gitweb.blame::
+gitweb.grep::
+gitweb.highlight::
+gitweb.patches::
+gitweb.pickaxe::
+gitweb.remote_heads::
+gitweb.showSizes::
+gitweb.snapshot::
+   See linkgit:gitweb.conf[5] for description.
-- 
2.19.1.647.g708186aaf9



[PATCH 22/59] config.txt: move gitweb.* to a separate file

2018-10-20 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt| 17 +
 Documentation/gitweb-config.txt | 16 
 2 files changed, 17 insertions(+), 16 deletions(-)
 create mode 100644 Documentation/gitweb-config.txt

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ebef3c867a..6898c9f567 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -337,22 +337,7 @@ include::gc-config.txt[]
 
 include::gitcvs-config.txt[]
 
-gitweb.category::
-gitweb.description::
-gitweb.owner::
-gitweb.url::
-   See linkgit:gitweb[1] for description.
-
-gitweb.avatar::
-gitweb.blame::
-gitweb.grep::
-gitweb.highlight::
-gitweb.patches::
-gitweb.pickaxe::
-gitweb.remote_heads::
-gitweb.showSizes::
-gitweb.snapshot::
-   See linkgit:gitweb.conf[5] for description.
+include::gitweb-config.txt[]
 
 grep.lineNumber::
If set to true, enable `-n` option by default.
diff --git a/Documentation/gitweb-config.txt b/Documentation/gitweb-config.txt
new file mode 100644
index 00..1b51475108
--- /dev/null
+++ b/Documentation/gitweb-config.txt
@@ -0,0 +1,16 @@
+gitweb.category::
+gitweb.description::
+gitweb.owner::
+gitweb.url::
+   See linkgit:gitweb[1] for description.
+
+gitweb.avatar::
+gitweb.blame::
+gitweb.grep::
+gitweb.highlight::
+gitweb.patches::
+gitweb.pickaxe::
+gitweb.remote_heads::
+gitweb.showSizes::
+gitweb.snapshot::
+   See linkgit:gitweb.conf[5] for description.
-- 
2.19.1.647.g708186aaf9



Re: gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi.

2018-09-08 Thread Julien Moutinho
Le sam. 08 sept. 2018 à 19:15:32 +0200, Julien Moutinho a écrit :
> As a quick workaround this hotpatch can even be put in $GITWEB_CONFIG
> by removing the `local` before `*FCGI::Stream::PRINT`.

Turns out to require more care than that,
due to $per_request_config reloading $GITWEB_CONFIG at each request,
overwriting FCGI::Stream::PRINT multiple times, messing the encoding.
This seems to work(around):
| if ($first_request) {
| my $enc = Encode::find_encoding('UTF-8');
| my $org = \&FCGI::Stream::PRINT;
| no warnings 'redefine';
| *FCGI::Stream::PRINT = sub {
| my @OUTPUT = @_;
| for (my $i = 1; $i < @_; $i++) {
| $OUTPUT[$i] = $enc->encode($_[$i], 
Encode::FB_CROAK|Encode::LEAVE_SRC);
| }
| @_ = @OUTPUT;
| goto $org;
| };
| }


gitweb: HTML output is not always encoded in UTF-8 when using --fastcgi.

2018-09-08 Thread Julien Moutinho
Description
===
An old (2011) description of the problem is here:
https://stackoverflow.com/questions/7285215/nginx-fastcgi-utf-8-encoding-output-iso-8859-1-instead-of-utf8#answer-18149487

Basically, gitweb's HTML output is not always encoded in UTF-8
when using --fastcgi.


Reproduction

gitweb v2.18.0
perl   v5.28.0

| echo Système >test.git/description

According to the 2011 problem report,
the problem only appears when using gitweb.cgi --fastcgi
not when gitweb.cgi is spawned by fcgiwrap.

And apparently, the text must not contain one character
which cannot be correctly converted to ISO-8859-1,
or an UTF-8 encoding is done (not sure by what);
which made this bug harder to spot.


Explanation
===
According to Christian Hansen (chansen), the problem is that:
> FCGI streams are implemented using the older stream API,
> TIEHANDLE. Applying PerlIO layers using binmode() has no effect.
https://stackoverflow.com/questions/5005104/how-to-force-fastcgi-to-encode-form-data-as-utf-8-as-cgi-pm-has-option#answer-7097698
https://perldoc.perl.org/perltie.html

Indeed:
> FCGI.pm isn't Unicode aware,
> only characters within the range 0x00-0xFF are supported.
https://metacpan.org/pod/FCGI#LIMITATIONS

But, as stated in gitweb's to_utf8():
> gitweb writes out in utf-8
> thanks to "binmode STDOUT, ':utf8'" at beginning"


Correction
==
Christian Hansen suggested that:
"The proper solution would be to encode your data before outputting it,
but if thats not an option I can offer this hotpatch:"

| my $enc = Encode::find_encoding('UTF-8');
| my $org = \&FCGI::Stream::PRINT;
| no warnings 'redefine';
| local *FCGI::Stream::PRINT = sub {
| my @OUTPUT = @_;
| for (my $i = 1; $i < @_; $i++) {
| $OUTPUT[$i] = $enc->encode($_[$i], 
Encode::FB_CROAK|Encode::LEAVE_SRC);
| }
| @_ = @OUTPUT;
| goto $org;
| };

As a quick workaround this hotpatch can even be put in $GITWEB_CONFIG
by removing the `local` before `*FCGI::Stream::PRINT`.


Re: gitweb and Levenshtein

2018-07-14 Thread Johannes Schindelin
Hi David,

On Thu, 12 Jul 2018, David Brown wrote:

> Howdy, I want to hack the getweb_make_index.perl script to create a string
> search using: https://github.com/git/git/blob/master/levenshtein.c.
> 
> How do i reference the compiled code?
> 
> I would like to call this routine using Java and maybe Perl.

The code to compute the Levenshtein distance is not exposed in Git's
command-line interface. Therefore, there is currently no way to call it
directly from Java or Perl.

The best you could do for now is to convert the code to Java or Perl.

Ciao,
Johannes


gitweb and Levenshtein

2018-07-12 Thread David Brown
Howdy, I want to hack the getweb_make_index.perl script to create a 
string search using: 
https://github.com/git/git/blob/master/levenshtein.c.


How do i reference the compiled code?

I would like to call this routine using Java and maybe Perl.

Please advise.

Thanks.

Regards,


Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-03 Thread Junio C Hamano
Jakub Narebski  writes:

> I think the problem is not with aligning, otherwise we would simply get
> bad aling, and not visible corruption.  The ACTUAL PROBLEM is most
> probably because of concatenating strings marked as UTF-8 and strings
> not marked as UTF-8.  Strange things happen then in Perl, unfortunetaly.

Sounds quite right---the patch needs to be retitled, if the bug is
not about "measuring offset", which I think is what fooled me when I
sent my response.



Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-03 Thread Shin Kojima
> One solution would be to force conversion to UTF-8 on input via "open"
> pragma (e.g. "use open ':encoding(UTF-8)';").  But there is no
> UTF-8-with_fallback encoding available - we would have to write one, and
> install it as module (or fake it via Perl trickery).  This mechanism is
> almost the same to what we currently use in gitwbe.

Yes, I tried using `Encode::Guess` with "open" pragma, but no luck.
https://perldoc.perl.org/Encode/Guess.html

I'm also afraid of "open" pragma does not work properly while using
git_blame_common().  Let's say someone using non-ASCII characters in
his/her name, committing non-UTF8 encoded characters.  git-blame will
combine them in the same line.  Following is an example:

$ git blame dummy | xxd
: 3461 6464 3565 6331 2028 e585 90e5 b3b6  4add5ec1 (..
0010: 20e6 96b0 2032 3031 382d 3035 2d30 3320   ... 2018-05-03
0020: 3232 3a34 383a 3432 202b 3039 3030 2031  22:48:42 +0900 1
0030: 2920 8367 8389 8343 0a   ) .g...C.

* e585 90e5 b3b6 20e6 96b0 : my name, encoded with UTF-8
* 8367 8389 8343   : "トライ" encoded with Shift_JIS

It means I need to split each lines of git-blame output at the very
beginning, then convert the first-half as UTF-8 and the second-half as
Shift_JIS.

Sincerely,

-- 
Shin Kojima


Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-03 Thread Jakub Narebski
Junio C Hamano  writes:
> Shin Kojima  writes:
>
>> Offset positions should not be counted by byte length, but by actual
>> character length.
>> ...
>>  # escape tabs (convert tabs to spaces)
>>  sub untabify {
>> -my $line = shift;
>> +my $line = to_utf8(shift);
>>  
>>  while ((my $pos = index($line, "\t")) != -1) {
>>  if (my $count = (8 - ($pos % 8))) {
>
> Some codepaths in the resulting codeflow look even hackier than they
> already are.  For example, format_rem_add_lines_pair() calls
> untabify() and then feeds its result to esc_html().  The first thing
> done in esc_html() is to call to_utf8().  I know that to_utf8()
> cheats and leaves the incoming string as-is if it is already UTF-8,
> so this may be a safe conversion, but ideally we should be able to
> say "function X takes non-UTF8 and works on it", "function Y takes
> UTF8 and works on it", and "function Z takes non-UTF8 and gives UTF8
> data back" for each functions clearly, not "function W can take
> either UTF8 or any other garbage and tries to return UTF8".

The problem with handling encoding in sane way, that is encode it on
input (to UTF-8), and decode on output (to plain text or HTML) is the
$fallback_encoding.

Gitweb assumes that everything uses UTF-8 encoding.  If the source is
not in UTF-8, but for example uses latin-1 encoding, then there we could
stumble upon byte sequences which are not valid UTF-8.  If that happens,
then gitweb tries to convert it to UTF-8 using $fallback_encoding.  If
$fallback_encoding is single-byte encoding, like latin-1, where any byte
sequence is valid, then that's all.  If there is an error during
conversion to UTF-8, then Unicode REPLACEMENT CHARACTER, code point
U+FFFD, is used.

But there are places where gitweb outputs plain text; the intention is
to use source data as is - to have it as one would have in the console.
Some input paths are common for plain text and HTML output; because of
that problem the data is not converted to UTF-8 on input.


The to_utf8() function tries to be clever, and do not convert alredy
converted data.

> Also, does it even "fix" the problem to use to_utf8() here in the
> first place?  Untabify is about aligning the character after a HT to
> multiple of 8 display position, so we'd want measure display width,
> which is not the same as either byte count or char count.

I think the problem is not with aligning, otherwise we would simply get
bad aling, and not visible corruption.  The ACTUAL PROBLEM is most
probably because of concatenating strings marked as UTF-8 and strings
not marked as UTF-8.  Strange things happen then in Perl, unfortunetaly.


One solution would be to force conversion to UTF-8 on input via "open"
pragma (e.g. "use open ':encoding(UTF-8)';").  But there is no
UTF-8-with_fallback encoding available - we would have to write one, and
install it as module (or fake it via Perl trickery).  This mechanism is
almost the same to what we currently use in gitwbe.

Another would be to use the trick that Perl 6 uses when encountering
byte sequence that is invalid UTF-8 - encode it using private plane in
Unicode using UTF-8, thus achieving lossless conversion / decoding.  But
this also as far as I know is not available from CPAN, so we would have
to implement it ourself.

Best,
-- 
Jakub Narębski


Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-02 Thread Shin Kojima
> ideally we should be able to say "function X takes non-UTF8 and
> works on it", "function Y takes UTF8 and works on it", and "function
> Z takes non-UTF8 and gives UTF8 data back" for each functions
> clearly, not "function W can take either UTF8 or any other garbage
> and tries to return UTF8".

Yes, I totally agree with that.


> Some codepaths in the resulting codeflow look even harder than they
> already are.  For example, format_rem_add_lines_pair() calls
> untabify() and then feeds its result to esc_html().

Honestly, I discovered this problem exactly from
format_rem_add_lines_pair().  In my environment($fallback_encoding
= 'cp932'), some commitdiff shows garbled text only inside color-words
portions.

I added a reproduce process at the end of this message.

After my investigation, I thought format_rem_add_lines_pair() tries to
use split()/index()/substr()/etc against raw blob contents and that
produces funny characters.  These builtin functions should be used to
decoded string.

untabify() looks proper place for me to decode blob contents
beforehand, as it definitely is not to be used for binary contests
like images and compressed snapshot files.

I'm sure using to_utf8() in untabify() fixes this problem.  In fact,
there is also another similar problem in blame function that assumes
blob contents as if utf8 encoded:

binmode $fd, ':utf8';

I personally consider text blob contents should be decoded as soon as
possible and esc_html() should not contain to_utf8(), but the
codeflow is slightly vast and I couldn't eliminate the possibility of
calls from somewhere else that does not care character encodings.

So yes, I would appreciate hearing your thoughts.


> Also, does it even "fix" the problem to use to_utf8() here in the
> first place?  Untabify is about aligning the character after a HT to
> multiple of 8 display position, so we'd want measure display width,
> which is not the same as either byte count or char count.

Following is a reproduce process:

$ git --version
git version 2.17.0

$ mkdir test
$ cd test
$ git init
$ echo 'モバイル' | iconv -f UTF-8 -t Shift_JIS > dummy
$ git add .
$ git commit -m 'init'
$ echo 'インスタント' | iconv -f UTF-8 -t Shift_JIS > dummy
$ git commit -am 'change'
$ git instaweb
$ echo 'our $fallback_encoding = "cp932";' >> .git/gitweb/gitweb_config.perl
$ w3m -dump 'http://127.0.0.1:1234/?p=.git;a=commitdiff'

What I got:

gitprojects / .git / commitdiff
[commit   ] ? search: [] [ ]re
summary | shortlog | log | commit | commitdiff | tree
raw | patch | inline | side by side (parent: 79e26fe)
change master

authorShin Kojima 
  Wed, 2 May 2018 10:55:01 + (19:55 +0900)
committer Shin Kojima 
  Wed, 2 May 2018 10:55:01 + (19:55 +0900)

dummy  patch | blob | history


diff --git a/dummy b/dummy
index ac37f38..31fb96a 100644 (file)
--- a/dummy
+++ b/dummy
@@ -1 +1 @@
-coイル
+Cンスタント
Unnamed repository; edit this file 'description' to name the repository.
RSS Atom

What I expected:

gitprojects / .git / commitdiff
[commit   ] ? search: [] [ ]re
summary | shortlog | log | commit | commitdiff | tree
raw | patch | inline | side by side (parent: 79e26fe)
change master

authorShin Kojima 
  Wed, 2 May 2018 10:55:01 + (19:55 +0900)
committer Shin Kojima 
  Wed, 2 May 2018 10:55:01 + (19:55 +0900)

dummy  patch | blob | history


diff --git a/dummy b/dummy
index ac37f38..31fb96a 100644 (file)
--- a/dummy
+++ b/dummy
@@ -1 +1 @@
-モバイル
+インスタント
Unnamed repository; edit this file 'description' to name the repository.
RSS Atom


-- 
Shin Kojima


Re: [PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-05-02 Thread Junio C Hamano
Shin Kojima  writes:

> Offset positions should not be counted by byte length, but by actual
> character length.
> ...
>  # escape tabs (convert tabs to spaces)
>  sub untabify {
> - my $line = shift;
> + my $line = to_utf8(shift);
>  
>   while ((my $pos = index($line, "\t")) != -1) {
>   if (my $count = (8 - ($pos % 8))) {

Some codepaths in the resulting codeflow look even hackier than they
already are.  For example, format_rem_add_lines_pair() calls
untabify() and then feeds its result to esc_html().  The first thing
done in esc_html() is to call to_utf8().  I know that to_utf8()
cheats and leaves the incoming string as-is if it is already UTF-8,
so this may be a safe conversion, but ideally we should be able to
say "function X takes non-UTF8 and works on it", "function Y takes
UTF8 and works on it", and "function Z takes non-UTF8 and gives UTF8
data back" for each functions clearly, not "function W can take
either UTF8 or any other garbage and tries to return UTF8".

Also, does it even "fix" the problem to use to_utf8() here in the
first place?  Untabify is about aligning the character after a HT to
multiple of 8 display position, so we'd want measure display width,
which is not the same as either byte count or char count.


[PATCH] gitweb: Measure offsets against UTF-8 flagged string

2018-04-30 Thread Shin Kojima
Offset positions should not be counted by byte length, but by actual
character length.

>5183   # We need to untabify lines before split()'ing them;
>5184   # otherwise offsets would be invalid.

Horizontal tab is not the only case we need to consider.  Please excuse
me for using your name here, but the following URL can not find "match"
occurances while using `git-instaweb` on the git repository.

http://127.0.0.1:1234/?p=.git&a=search&h=HEAD&st=grep&s=Nar%C4%99bski

Signed-off-by: Shin Kojima 
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2594a4bad..a5a9093a1 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1697,7 +1697,7 @@ sub unquote {
 
 # escape tabs (convert tabs to spaces)
 sub untabify {
-   my $line = shift;
+   my $line = to_utf8(shift);
 
while ((my $pos = index($line, "\t")) != -1) {
if (my $count = (8 - ($pos % 8))) {
-- 
2.17.0



[ANNOUNCE] Gitwin: Git Server for Windows with SSH/HTTP(S) transport and Gitweb

2018-04-27 Thread tk
Hi all,

This is a ONE-TIME announcement of Gitwin - a Git Server for Windows:

Gitwin is a packaging of Git, OpenSSH, Nginx and many other related tools to 
make it a ready-to-use
solution as a secure git repository on Windows. It supports SSH and HTTP(S) 
transports as well as
Gitweb with Highlighter and Gravatar support. A free edition is available.

Please visit https://itefix.net/gitwin for more information

Kind regards
Tevfik Karagulle
Itefix.net


Re: Socket activation for GitWeb FastCGI with systemd?

2018-04-04 Thread Alex Ivanov


03.04.2018, 23:04, "Jacob Keller" :
> On Tue, Apr 3, 2018 at 11:53 AM, Alex Ivanov  wrote:
>>  Hi.
>>  I want to use systemd as fastcgi spawner for gitweb + nginx.
>>  The traffic is low and number of users is limited + traversal bots. For 
>> that reason I've decided to use following mimimal services
>>
>>  gitweb.socket
>>  [Unit]
>>  Description=GitWeb Socket
>>
>>  [Socket]
>>  ListenStream=/run/gitweb.sock
>>  Accept=false
>>
>>  [Install]
>>  WantedBy=sockets.target
>>
>>  gitweb.service
>>  [Unit]
>>  Description=GitWeb Service
>>
>>  [Service]
>>  Type=simple
>>  ExecStart=/path/to/gitweb.cgi --fcgi
>>  StandardInput=socket
>>
>>  However this scheme is not resistant to simple DDOS.
>>  E.g. traversal bots often kill the service by opening non existing path 
>> (e.g http://host/?p=repo;a=blob;f=nonexisting/path;hb=HEAD showing in 
>> browser 404 - Cannot find file) many times consecutively, which leads to
>>  Apr 03 21:32:10 host systemd[1]: gitweb.service: Start request repeated too 
>> quickly.
>>  Apr 03 21:32:10 host systemd[1]: gitweb.service: Failed with result 
>> 'start-limit-hit'.
>>  Apr 03 21:32:10 host systemd[1]: Failed to start GitWeb service.
>>  and 502 Bad Gateway in browser. I believe the reason is that gitweb.service 
>> dies on failure and if it happens too often, systemd declines to restart the 
>> service due to start limit hit.
>>  So my question is how to correct systemd services for GitWeb to be 
>> resistant to such issue? I prefer to use single process to process all 
>> clients.
>>  Thanks.
>
> This sounds like a systemd specific question that might get a better
> answer from the systemd mailing list.

Thanks I will try that too.

>
> That being said, I believe if in this case gitweb is dying due to the
> path not existing? You might be able to configure systemd to
> understand that the particular exit code for when the path doesn't
> exist is a "valid" exit, and not a failure case..

I will try to do that, but I'm afraid that there may be other ways to remotely 
abuse the service.

>
> I'm not entirely understanding your goal.. you want each request to
> launch the gitweb process, and when it's done you want it to exit? But
> if there are multiple connections at once you want it to stay alive
> until it services them all? I think the best answer is configure
> systemd to understand that the exit code for when the path is invalid
> will be counted as a success.

I want a single process for all connections too keep RAM usage at minimal. I 
also though it fits my case since number of users is low.

>
> Thanks,
> Jake


Re: Socket activation for GitWeb FastCGI with systemd?

2018-04-03 Thread Jacob Keller
On Tue, Apr 3, 2018 at 11:53 AM, Alex Ivanov  wrote:
> Hi.
> I want to use systemd as fastcgi spawner for gitweb + nginx.
> The traffic is low and number of users is limited + traversal bots. For that 
> reason I've decided to use following mimimal services
>
> gitweb.socket
> [Unit]
> Description=GitWeb Socket
>
> [Socket]
> ListenStream=/run/gitweb.sock
> Accept=false
>
> [Install]
> WantedBy=sockets.target
>
> gitweb.service
> [Unit]
> Description=GitWeb Service
>
> [Service]
> Type=simple
> ExecStart=/path/to/gitweb.cgi --fcgi
> StandardInput=socket
>
> However this scheme is not resistant to simple DDOS.
> E.g. traversal bots often kill the service by opening non existing path (e.g 
> http://host/?p=repo;a=blob;f=nonexisting/path;hb=HEAD showing in browser 404 
> - Cannot find file) many times consecutively, which leads to
> Apr 03 21:32:10 host systemd[1]: gitweb.service: Start request repeated too 
> quickly.
> Apr 03 21:32:10 host systemd[1]: gitweb.service: Failed with result 
> 'start-limit-hit'.
> Apr 03 21:32:10 host systemd[1]: Failed to start GitWeb service.
> and 502 Bad Gateway in browser. I believe the reason is that gitweb.service 
> dies on failure and if it happens too often, systemd declines to restart the 
> service due to start limit hit.
> So my question is how to correct systemd services for GitWeb to be resistant 
> to such issue? I prefer to use single process to process all clients.
> Thanks.

This sounds like a systemd specific question that might get a better
answer from the systemd mailing list.

That being said, I believe if in this case gitweb is dying due to the
path not existing? You might be able to configure systemd to
understand that the particular exit code for when the path doesn't
exist is a "valid" exit, and not a failure case..

I'm not entirely understanding your goal.. you want each request to
launch the gitweb process, and when it's done you want it to exit? But
if there are multiple connections at once you want it to stay alive
until it services them all? I think the best answer is configure
systemd to understand that the exit code for when the path is invalid
will be counted as a success.

Thanks,
Jake


Socket activation for GitWeb FastCGI with systemd?

2018-04-03 Thread Alex Ivanov
Hi.
I want to use systemd as fastcgi spawner for gitweb + nginx. 
The traffic is low and number of users is limited + traversal bots. For that 
reason I've decided to use following mimimal services

gitweb.socket
[Unit]
Description=GitWeb Socket

[Socket]
ListenStream=/run/gitweb.sock
Accept=false

[Install]
WantedBy=sockets.target

gitweb.service
[Unit]
Description=GitWeb Service

[Service]
Type=simple
ExecStart=/path/to/gitweb.cgi --fcgi
StandardInput=socket

However this scheme is not resistant to simple DDOS.
E.g. traversal bots often kill the service by opening non existing path (e.g 
http://host/?p=repo;a=blob;f=nonexisting/path;hb=HEAD showing in browser 404 - 
Cannot find file) many times consecutively, which leads to
Apr 03 21:32:10 host systemd[1]: gitweb.service: Start request repeated too 
quickly.
Apr 03 21:32:10 host systemd[1]: gitweb.service: Failed with result 
'start-limit-hit'.
Apr 03 21:32:10 host systemd[1]: Failed to start GitWeb service.
and 502 Bad Gateway in browser. I believe the reason is that gitweb.service 
dies on failure and if it happens too often, systemd declines to restart the 
service due to start limit hit.
So my question is how to correct systemd services for GitWeb to be resistant to 
such issue? I prefer to use single process to process all clients.
Thanks.


[PATCH v3 04/13] gitweb: hard-depend on the Digest::MD5 5.8 module

2018-03-03 Thread Ævar Arnfjörð Bjarmason
Since my d48b284183 ("perl: bump the required Perl version to 5.8 from
5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to
conditionally require Digest::MD5 anymore. It was released with perl
v5.7.3[1]

The initial introduction of the dependency in
e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much,
this also undoes part of the later 2e9c8789b7 ("gitweb: Mention
optional Perl modules in INSTALL", 2011-02-04) since gitweb will
always be run on at least 5.8, so there's no need to mention
Digest::MD5 as a required module in the documentation, let's instead
say that we require perl 5.8.

1. $ corelist Digest::MD5
   Data for 2015-02-14
   Digest::MD5 was first released with perl v5.7.3

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 gitweb/INSTALL |  3 +--
 gitweb/gitweb.perl | 17 -
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index 408f2859d3..a58e6b3c44 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -29,12 +29,11 @@ Requirements
 
 
  - Core git tools
- - Perl
+ - Perl 5.8
  - Perl modules: CGI, Encode, Fcntl, File::Find, File::Basename.
  - web server
 
 The following optional Perl modules are required for extra features
- - Digest::MD5 - for gravatar support
  - CGI::Fast and FCGI - for running gitweb as FastCGI script
  - HTML::TagCloud - for fancy tag cloud in project list view
  - HTTP::Date or Time::ParseDate - to support If-Modified-Since for feeds
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2417057f2b..2594a4badb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -20,6 +20,8 @@ use Fcntl ':mode';
 use File::Find qw();
 use File::Basename qw(basename);
 use Time::HiRes qw(gettimeofday tv_interval);
+use Digest::MD5 qw(md5_hex);
+
 binmode STDOUT, ':utf8';
 
 if (!defined($CGI::VERSION) || $CGI::VERSION < 4.08) {
@@ -490,7 +492,6 @@ our %feature = (
# Currently available providers are gravatar and picon.
# If an unknown provider is specified, the feature is disabled.
 
-   # Gravatar depends on Digest::MD5.
# Picon currently relies on the indiana.edu database.
 
# To enable system wide have in $GITWEB_CONFIG
@@ -1166,18 +1167,8 @@ sub configure_gitweb_features {
our @snapshot_fmts = gitweb_get_feature('snapshot');
@snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
 
-   # check that the avatar feature is set to a known provider name,
-   # and for each provider check if the dependencies are satisfied.
-   # if the provider name is invalid or the dependencies are not met,
-   # reset $git_avatar to the empty string.
our ($git_avatar) = gitweb_get_feature('avatar');
-   if ($git_avatar eq 'gravatar') {
-   $git_avatar = '' unless (eval { require Digest::MD5; 1; });
-   } elsif ($git_avatar eq 'picon') {
-   # no dependencies
-   } else {
-   $git_avatar = '';
-   }
+   $git_avatar = '' unless $git_avatar =~ /^(?:gravatar|picon)$/s;
 
our @extra_branch_refs = gitweb_get_feature('extra-branch-refs');
@extra_branch_refs = filter_and_validate_refs (@extra_branch_refs);
@@ -2167,7 +2158,7 @@ sub gravatar_url {
my $size = shift;
$avatar_cache{$email} ||=
"//www.gravatar.com/avatar/" .
-   Digest::MD5::md5_hex($email) . "?s=";
+   md5_hex($email) . "?s=";
return $avatar_cache{$email} . $size;
 }
 
-- 
2.15.1.424.g9478a66081



Re: [PATCH v2 04/13] gitweb: hard-depend on the Digest::MD5 5.8 module

2018-02-25 Thread Eric Sunshine
On Sun, Feb 25, 2018 at 2:46 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Since my d48b284183 ("perl: bump the required Perl version to 5.8 from
> 5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to
> conditionally require Digest::MD5 anymore. It was released with perl
> v5.7.3[1]
>
> The initial introduction of the dependency in
> e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much,
> this also undoes part of the later 2e9c8789b7 ("gitweb: Mention
> optional Perl modules in INSTALL", 2011-02-04) since gitweb will
> always be run on at least 5.8, so there's no need to mention
> Digest::MD5 as a required module in the documentation, let's instead
> say that we require perl 5.8.
>
> As with the preceding Net::{SMTP,Domain} change we now unconditionally
> "use" the module instead.

This is patch 4/13 but the Net::{SMTP,Domain} change doesn't come
until 6/13, so saying "preceding" is confusing.

> 1. $ corelist Digest::MD5
>Data for 2015-02-14
>Digest::MD5 was first released with perl v5.7.3
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 


[PATCH v2 04/13] gitweb: hard-depend on the Digest::MD5 5.8 module

2018-02-25 Thread Ævar Arnfjörð Bjarmason
Since my d48b284183 ("perl: bump the required Perl version to 5.8 from
5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to
conditionally require Digest::MD5 anymore. It was released with perl
v5.7.3[1]

The initial introduction of the dependency in
e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much,
this also undoes part of the later 2e9c8789b7 ("gitweb: Mention
optional Perl modules in INSTALL", 2011-02-04) since gitweb will
always be run on at least 5.8, so there's no need to mention
Digest::MD5 as a required module in the documentation, let's instead
say that we require perl 5.8.

As with the preceding Net::{SMTP,Domain} change we now unconditionally
"use" the module instead.

1. $ corelist Digest::MD5
   Data for 2015-02-14
   Digest::MD5 was first released with perl v5.7.3

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 gitweb/INSTALL |  3 +--
 gitweb/gitweb.perl | 17 -
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index 408f2859d3..a58e6b3c44 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -29,12 +29,11 @@ Requirements
 
 
  - Core git tools
- - Perl
+ - Perl 5.8
  - Perl modules: CGI, Encode, Fcntl, File::Find, File::Basename.
  - web server
 
 The following optional Perl modules are required for extra features
- - Digest::MD5 - for gravatar support
  - CGI::Fast and FCGI - for running gitweb as FastCGI script
  - HTML::TagCloud - for fancy tag cloud in project list view
  - HTTP::Date or Time::ParseDate - to support If-Modified-Since for feeds
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2417057f2b..2594a4badb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -20,6 +20,8 @@ use Fcntl ':mode';
 use File::Find qw();
 use File::Basename qw(basename);
 use Time::HiRes qw(gettimeofday tv_interval);
+use Digest::MD5 qw(md5_hex);
+
 binmode STDOUT, ':utf8';
 
 if (!defined($CGI::VERSION) || $CGI::VERSION < 4.08) {
@@ -490,7 +492,6 @@ our %feature = (
# Currently available providers are gravatar and picon.
# If an unknown provider is specified, the feature is disabled.
 
-   # Gravatar depends on Digest::MD5.
# Picon currently relies on the indiana.edu database.
 
# To enable system wide have in $GITWEB_CONFIG
@@ -1166,18 +1167,8 @@ sub configure_gitweb_features {
our @snapshot_fmts = gitweb_get_feature('snapshot');
@snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
 
-   # check that the avatar feature is set to a known provider name,
-   # and for each provider check if the dependencies are satisfied.
-   # if the provider name is invalid or the dependencies are not met,
-   # reset $git_avatar to the empty string.
our ($git_avatar) = gitweb_get_feature('avatar');
-   if ($git_avatar eq 'gravatar') {
-   $git_avatar = '' unless (eval { require Digest::MD5; 1; });
-   } elsif ($git_avatar eq 'picon') {
-   # no dependencies
-   } else {
-   $git_avatar = '';
-   }
+   $git_avatar = '' unless $git_avatar =~ /^(?:gravatar|picon)$/s;
 
our @extra_branch_refs = gitweb_get_feature('extra-branch-refs');
@extra_branch_refs = filter_and_validate_refs (@extra_branch_refs);
@@ -2167,7 +2158,7 @@ sub gravatar_url {
my $size = shift;
$avatar_cache{$email} ||=
"//www.gravatar.com/avatar/" .
-   Digest::MD5::md5_hex($email) . "?s=";
+   md5_hex($email) . "?s=";
return $avatar_cache{$email} . $size;
 }
 
-- 
2.15.1.424.g9478a66081



Re: [PATCH 7/8] gitweb: hard-depend on the Digest::MD5 5.8 module

2018-02-14 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  gitweb/INSTALL |  3 +--
>  gitweb/gitweb.perl | 17 +
>  2 files changed, 6 insertions(+), 14 deletions(-)

Makes sense, and I like the diffstat.

[...]
> +++ b/gitweb/gitweb.perl
[...]
> @@ -1166,18 +1165,11 @@ sub configure_gitweb_features {
>   our @snapshot_fmts = gitweb_get_feature('snapshot');
>   @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
>  
> - # check that the avatar feature is set to a known provider name,
> - # and for each provider check if the dependencies are satisfied.
> - # if the provider name is invalid or the dependencies are not met,
> - # reset $git_avatar to the empty string.
> + # check that the avatar feature is set to a known provider
> + # name, if the provider name is invalid, reset $git_avatar to
> + # the empty string.

Comma splice.  Formatting as sentences would make this easier to read,
anyway:

# Check that the avatar feature is set to a known provider name.
# If the provider name is invalid, reset $git_avatar to the empty
# string.

Even better would be to remove the comment altogether.  The code is
clear enough on its own and the comment should not be needed now that
it is a one-liner.

[...]
> @@ -2165,6 +2157,7 @@ sub picon_url {
>  sub gravatar_url {
>   my $email = lc shift;
>   my $size = shift;
> + require Digest::MD5;

Same question as the previous patch: how do we decide whether to 'use'
or to 'require' in cases like this?

Thanks,
Jonathan


[PATCH 7/8] gitweb: hard-depend on the Digest::MD5 5.8 module

2018-02-14 Thread Ævar Arnfjörð Bjarmason
Since my d48b284183 ("perl: bump the required Perl version to 5.8 from
5.6.[21]", 2010-09-24), we've depended on 5.8, so there's no reason to
conditionally require Digest::MD5 anymore. It was released with perl
v5.7.3.

The initial introduction of the dependency in
e9fdd74e53 ("gitweb: (gr)avatar support", 2009-06-30) says as much,
this also undoes part of the later 2e9c8789b7 ("gitweb: Mention
optional Perl modules in INSTALL", 2011-02-04) since gitweb will
always be run on at least 5.8, so there's no need to mention
Digest::MD5 as a required module in the documentation, let's instead
say that we require perl 5.8.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 gitweb/INSTALL |  3 +--
 gitweb/gitweb.perl | 17 +
 2 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index 408f2859d3..a58e6b3c44 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -29,12 +29,11 @@ Requirements
 
 
  - Core git tools
- - Perl
+ - Perl 5.8
  - Perl modules: CGI, Encode, Fcntl, File::Find, File::Basename.
  - web server
 
 The following optional Perl modules are required for extra features
- - Digest::MD5 - for gravatar support
  - CGI::Fast and FCGI - for running gitweb as FastCGI script
  - HTML::TagCloud - for fancy tag cloud in project list view
  - HTTP::Date or Time::ParseDate - to support If-Modified-Since for feeds
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2417057f2b..8f48e3c02e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -490,7 +490,6 @@ our %feature = (
# Currently available providers are gravatar and picon.
# If an unknown provider is specified, the feature is disabled.
 
-   # Gravatar depends on Digest::MD5.
# Picon currently relies on the indiana.edu database.
 
# To enable system wide have in $GITWEB_CONFIG
@@ -1166,18 +1165,11 @@ sub configure_gitweb_features {
our @snapshot_fmts = gitweb_get_feature('snapshot');
@snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
 
-   # check that the avatar feature is set to a known provider name,
-   # and for each provider check if the dependencies are satisfied.
-   # if the provider name is invalid or the dependencies are not met,
-   # reset $git_avatar to the empty string.
+   # check that the avatar feature is set to a known provider
+   # name, if the provider name is invalid, reset $git_avatar to
+   # the empty string.
our ($git_avatar) = gitweb_get_feature('avatar');
-   if ($git_avatar eq 'gravatar') {
-   $git_avatar = '' unless (eval { require Digest::MD5; 1; });
-   } elsif ($git_avatar eq 'picon') {
-   # no dependencies
-   } else {
-   $git_avatar = '';
-   }
+   $git_avatar = '' unless $git_avatar =~ /^(?:gravatar|picon)$/s;
 
our @extra_branch_refs = gitweb_get_feature('extra-branch-refs');
@extra_branch_refs = filter_and_validate_refs (@extra_branch_refs);
@@ -2165,6 +2157,7 @@ sub picon_url {
 sub gravatar_url {
my $email = lc shift;
my $size = shift;
+   require Digest::MD5;
$avatar_cache{$email} ||=
"//www.gravatar.com/avatar/" .
Digest::MD5::md5_hex($email) . "?s=";
-- 
2.15.1.424.g9478a66081



Re: [PATCH] gitweb bugfix - check for search permission on sub-directories while scanning project root to prevent program termination

2017-07-18 Thread Junio C Hamano
Hielke Christian Braun  writes:

> Hi,
>
> gitweb terminates and shows no project list, if it can not access a
> sub-directory in the project root directory. It should show a list of
> the projects it can access. Patch corrects this by skipping inaccessible
> directories.
>
>
> Signed-off-by: Hielke Christian Braun 
>
>
> ---
>  gitweb/gitweb.perl | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 3d4a8ee27c96a..9208f42ed1753 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3071,6 +3071,8 @@ sub git_get_projects_list {
>   return if (m!^[/.]$!);
>   # only directories can be git repositories
>   return unless (-d $_);
> + # need search permission
> + return unless (-x $_);
>   # don't traverse too deep (Find is super slow 
> on os x)
>   # $project_maxdepth excludes depth of 
> $projectroot
>   if (($File::Find::name =~ tr!/!!) - $pfxdepth > 
> $project_maxdepth) {
>
> --
> https://github.com/git/git/pull/384

I've tweaked the title and the log message further before queuing
the patch.

Thanks.  



[PATCH] gitweb bugfix - check for search permission on sub-directories while scanning project root to prevent program termination

2017-07-18 Thread Hielke Christian Braun
Hi,

gitweb terminates and shows no project list, if it can not access a
sub-directory in the project root directory. It should show a list of
the projects it can access. Patch corrects this by skipping inaccessible
directories.


Signed-off-by: Hielke Christian Braun 


---
 gitweb/gitweb.perl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 3d4a8ee27c96a..9208f42ed1753 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3071,6 +3071,8 @@ sub git_get_projects_list {
return if (m!^[/.]$!);
# only directories can be git repositories
return unless (-d $_);
+   # need search permission
+   return unless (-x $_);
# don't traverse too deep (Find is super slow 
on os x)
# $project_maxdepth excludes depth of 
$projectroot
if (($File::Find::name =~ tr!/!!) - $pfxdepth > 
$project_maxdepth) {

--
https://github.com/git/git/pull/384






On 2017-07-17 22:18, Junio C Hamano wrote:
> Hielke Christian Braun  writes:
> 
>> ---
>>  gitweb/gitweb.perl | 2 ++
>>  1 file changed, 2 insertions(+)
> 
> Thanks for trying to help and welcome to Git development community.
> But
> 
>  (1) Please double-check the title of your change.  Imagine that the
>  title appears in a list of 600 other commits that goes in a
>  single release in "git shortlog --no-merges" output.  Does it
>  tell readers of the list what the change is about?  We cannot
>  even guess that it is about the project list that appears in
>  gitweb output.
> 
>  (2) Please explain what problem this is trying to solve; that is
>  what the blank space before "---" line we see up above is for.
>  What happens in the current code under what condition, until we
>  do not apply this patch, and why is it a bad thing to happen?
>  Once we apply this patch, in what way the situation gets
>  improved?
> 
>  (3) Please sign-off your patch (see SubmittingPatches in
>  Documentation).
> 
> Thanks.
> 
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 3d4a8ee27c96a..9208f42ed1753 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -3071,6 +3071,8 @@ sub git_get_projects_list {
>>  return if (m!^[/.]$!);
>>  # only directories can be git repositories
>>  return unless (-d $_);
>> +# need search permission
>> +return unless (-x $_);
>>  # don't traverse too deep (Find is super slow 
>> on os x)
>>  # $project_maxdepth excludes depth of 
>> $projectroot
>>  if (($File::Find::name =~ tr!/!!) - $pfxdepth > 
>> $project_maxdepth) {
>>
>> --
>> https://github.com/git/git/pull/384


Re: [PATCH 0/2] gitweb: tags feeds

2017-05-07 Thread Giuseppe Bilotta
Ping?

On Wed, Apr 19, 2017 at 8:49 AM, Giuseppe Bilotta
 wrote:
> A smallish patchset to implement RSS and Atom feeds to complement the
> tags view, accessible as verbs `tags_rss` and `tags_atom`.
>
> (I actually made this some 5 years ago, and it has been running on
> http://git.oblomov.eu/ since, but for some reason I forgot to submit
> it for upstreaming.)
>
> The patchset is also available in the git repository at:
>
>   git://git.oblomov.eu/git gitweb-tags-feed
>
> ----
>
> Giuseppe Bilotta (2):
>   gitweb: infrastructure for tags feed
>   gitweb: expose tags feed in appropriate places
>
>  gitweb/gitweb.perl | 126 
> ++---
>  1 file changed, 91 insertions(+), 35 deletions(-)
>
> --
> 2.12.2.822.g5451c77231
>



-- 
Giuseppe "Oblomov" Bilotta


[PATCH 1/2] gitweb: infrastructure for tags feed

2017-04-18 Thread Giuseppe Bilotta
Signed-off-by: Giuseppe Bilotta 
---
 gitweb/gitweb.perl | 79 +++---
 1 file changed, 58 insertions(+), 21 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7cf68f07b7..4adea84006 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3807,6 +3807,7 @@ sub git_get_tags_list {
 
if ($type eq "tag" || $type eq "commit") {
$ref_item{'epoch'} = $epoch;
+   $ref_item{'tz'} = $tz;
if ($epoch) {
$ref_item{'age'} = age_string(time - 
$ref_item{'epoch'});
} else {
@@ -8132,6 +8133,10 @@ sub git_shortlog {
 
 sub git_feed {
my $format = shift || 'atom';
+
+   # feed context: log, tags
+   my $ctx = shift || 'log';
+
my $have_blame = gitweb_check_feature('blame');
 
# Atom: http://www.atomenabled.org/developers/syndication/
@@ -8140,9 +8145,19 @@ sub git_feed {
die_error(400, "Unknown web feed format");
}
 
+   if ($ctx ne 'log' && $ctx ne 'tags') {
+   die_error(400, "Unknown web feed context");
+   }
+   my $tags = $ctx eq 'tags' ? 1 : 0;
+
# log/feed of current (HEAD) branch, log of given branch, history of 
file/directory
my $head = $hash || 'HEAD';
-   my @commitlist = parse_commits($head, 150, 0, $file_name);
+   my @commitlist;
+   if ($tags) {
+   @commitlist = git_get_tags_list(15);
+   } else {
+   @commitlist = parse_commits($head, 150, 0, $file_name);
+   }
 
my %latest_commit;
my %latest_date;
@@ -8154,9 +8169,12 @@ sub git_feed {
}
if (defined($commitlist[0])) {
%latest_commit = %{$commitlist[0]};
-   my $latest_epoch = $latest_commit{'committer_epoch'};
+   my $latest_epoch = $tags ? $latest_commit{'epoch'} :
+  $latest_commit{'committer_epoch'};
exit_if_unmodified_since($latest_epoch);
-   %latest_date = parse_date($latest_epoch, 
$latest_commit{'committer_tz'});
+   %latest_date = parse_date($latest_epoch,
+   $tags ? $latest_commit{'tz'} :
+   $latest_commit{'committer_tz'});
}
print $cgi->header(
-type => $content_type,
@@ -8171,7 +8189,9 @@ sub git_feed {
# header variables
my $title = "$site_name - $project/$action";
my $feed_type = 'log';
-   if (defined $hash) {
+   if ($tags) {
+   $feed_type = 'tags';
+   } elsif (defined $hash) {
$title .= " - '$hash'";
$feed_type = 'branch log';
if (defined $file_name) {
@@ -8189,6 +8209,7 @@ sub git_feed {
$descr = esc_html($descr);
} else {
$descr = "$project " .
+($tags ? 'tags ' : '') .
 ($format eq 'rss' ? 'RSS' : 'Atom') .
 " feed";
}
@@ -8197,7 +8218,9 @@ sub git_feed {
 
#header
my $alt_url;
-   if (defined $file_name) {
+   if ($tags) {
+   $alt_url = href(-full=>1, action=>"tags");
+   } elsif (defined $file_name) {
$alt_url = href(-full=>1, action=>"history", hash=>$hash, 
file_name=>$file_name);
} elsif (defined $hash) {
$alt_url = href(-full=>1, action=>"log", hash=>$hash);
@@ -8261,9 +8284,15 @@ XML
}
 
# contents
+   my $co_action = $tags ? 'tag' : 'commitdiff';
for (my $i = 0; $i <= $#commitlist; $i++) {
+   my %clco; # commit info from commitlist, only used for tags
my %co = %{$commitlist[$i]};
my $commit = $co{'id'};
+   if ($tags) {
+   %clco = %co;
+   %co = parse_tag($commit);
+   }
# we read 150, we always show 30 and the ones more recent than 
48 hours
if (($i >= 20) && ((time - $co{'author_epoch'}) > 48*60*60)) {
last;
@@ -8271,44 +8300,52 @@ XML
my %cd = parse_date($co{'author_epoch'}, $co{'author_tz'});
 
# get list of changed files
-   open my $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-   $c

[PATCH 2/2] gitweb: expose tags feed in appropriate places

2017-04-18 Thread Giuseppe Bilotta
Signed-off-by: Giuseppe Bilotta 
---
 gitweb/gitweb.perl | 47 +--
 1 file changed, 33 insertions(+), 14 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4adea84006..8be7444988 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -851,6 +851,8 @@ our %actions = (
"summary" => \&git_summary,
"tag" => \&git_tag,
"tags" => \&git_tags,
+   "tags_rss" => \&git_tags_rss,
+   "tags_atom" => \&git_tags_atom,
"tree" => \&git_tree,
"snapshot" => \&git_snapshot,
"object" => \&git_object,
@@ -2599,7 +2601,7 @@ sub get_feed_info {
return unless (defined $project);
# some views should link to OPML, or to generic project feed,
# or don't have specific feed yet (so they should use generic)
-   return if (!$action || $action =~ /^(?:tags|heads|forks|tag|search)$/x);
+   return if (!$action || $action =~ /^(?:heads|forks|search)$/x);
 
my $branch = undef;
# branches refs uses 'refs/' + $get_branch_refs()[x] + '/' prefix
@@ -2614,8 +2616,10 @@ sub get_feed_info {
}
}
# find log type for feed description (title)
-   my $type = 'log';
-   if (defined $file_name) {
+   my $type = "log";
+   if ($action eq 'tag' || $action eq 'tags') {
+   $type = "tags";
+   } elsif (defined $file_name) {
$type  = "history of $file_name";
$type .= "/" if ($action eq 'tree');
$type .= " on '$branch'" if (defined $branch);
@@ -4007,6 +4011,7 @@ sub print_feed_meta {
$href_params{'-title'} = 'log';
}
 
+   my $tag_view = $href_params{-title} eq 'tags';
foreach my $format (qw(RSS Atom)) {
my $type = lc($format);
my %link_attr = (
@@ -4016,7 +4021,7 @@ sub print_feed_meta {
);
 
$href_params{'extra_options'} = undef;
-   $href_params{'action'} = $type;
+   $href_params{'action'} = ($tag_view ? 'tags_' : '') . 
$type;
$link_attr{'-href'} = href(%href_params);
print "\n";
 
-   $href_params{'extra_options'} = '--no-merges';
-   $link_attr{'-href'} = href(%href_params);
-   $link_attr{'-title'} .= ' (no merges)';
-   print "\n";
+   unless ($tag_view) {
+   $href_params{'extra_options'} = '--no-merges';
+   $link_attr{'-href'} = href(%href_params);
+   $link_attr{'-title'} .= ' (no merges)';
+   print "\n";
+   }
}
 
} else {
@@ -4217,8 +4224,9 @@ sub git_footer_html {
}
$href_params{'-title'} ||= 'log';
 
+   my $tag_view = $href_params{-title} eq 'tags';
foreach my $format (qw(RSS Atom)) {
-   $href_params{'action'} = lc($format);
+   $href_params{'action'} = ($tag_view ? 'tags_' : '') . 
lc($format);
print $cgi->a({-href => href(%href_params),
  -title => "$href_params{'-title'} $format 
feed",
  -class => $feed_class}, $format)."\n";
@@ -8409,10 +8417,18 @@ sub git_rss {
git_feed('rss');
 }
 
+sub git_tags_rss {
+   git_feed('rss', 'tags')
+}
+
 sub git_atom {
git_feed('atom');
 }
 
+sub git_tags_atom {
+   git_feed('atom', 'tags')
+}
+
 sub git_opml {
my @list = git_get_projects_list($project_filter, $strict_export);
if (!@list) {
@@ -8457,6 +8473,9 @@ XML
my $rss  = href('project' => $proj{'path'}, 'action' => 'rss', 
-full => 1);
my $html = href('project' => $proj{'path'}, 'action' => 
'summary', -full => 1);
print "\n";
+   # and now the tags rss feed
+   $rss  = href('project' => $proj{'path'}, 'action' => 
'tags_rss', -full => 1);
+   print "\n";
}
print <
-- 
2.12.2.822.g5451c77231



[PATCH 0/2] gitweb: tags feeds

2017-04-18 Thread Giuseppe Bilotta
A smallish patchset to implement RSS and Atom feeds to complement the
tags view, accessible as verbs `tags_rss` and `tags_atom`.

(I actually made this some 5 years ago, and it has been running on
http://git.oblomov.eu/ since, but for some reason I forgot to submit
it for upstreaming.)

The patchset is also available in the git repository at:

  git://git.oblomov.eu/git gitweb-tags-feed



Giuseppe Bilotta (2):
  gitweb: infrastructure for tags feed
  gitweb: expose tags feed in appropriate places

 gitweb/gitweb.perl | 126 ++---
 1 file changed, 91 insertions(+), 35 deletions(-)

-- 
2.12.2.822.g5451c77231



[PATCH v2 0/2] Minor changes to skip gitweb tests without Time::HiRes

2017-03-01 Thread Ævar Arnfjörð Bjarmason
This was originally just one small patch, but Jakub Narębski pointed
out that calling the module "unusable" when we failed to load it was
confusing, so now the start of this series is just a rephrasing of an
existing error message I copied.

Ævar Arnfjörð Bjarmason (2):
  gitweb tests: Change confusing "skip_all" phrasing
  gitweb tests: Skip tests when we don't have Time::HiRes

 t/gitweb-lib.sh | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.11.0



Re: [PATCH v2 0/2] Minor changes to skip gitweb tests without Time::HiRes

2017-03-01 Thread Junio C Hamano
Thanks, will replace what has been on 'pu'.

Note that I'd dropped a double-SP from the latter one while queuing.


[PATCH v2 1/2] gitweb tests: Change confusing "skip_all" phrasing

2017-03-01 Thread Ævar Arnfjörð Bjarmason
Change the phrasing so that instead of saying that the CGI module is
unusable, we say that it's not available.

This came up on the git mailing list in
<4b34e3a0-3da7-d821-2a7f-9a420ac1d...@gmail.com> from Jakub Narębski.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/gitweb-lib.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index d5dab5a94f..59ef15efbd 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -110,7 +110,7 @@ perl -MEncode -e '$e="";decode_utf8($e, Encode::FB_CROAK)' 
>/dev/null 2>&1 || {
 }
 
 perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 || {
-   skip_all='skipping gitweb tests, CGI module unusable'
+   skip_all='skipping gitweb tests, CGI & CGI::Util & CGI::Carp modules 
not available'
test_done
 }
 
-- 
2.11.0



[PATCH v2 2/2] gitweb tests: Skip tests when we don't have Time::HiRes

2017-03-01 Thread Ævar Arnfjörð Bjarmason
Change the gitweb tests to skip when we can't load the Time::HiRes
module.

Gitweb needs this module to work. It has been in perl core since v5.8,
which is the oldest version we support. However CentOS (and perhaps
some other distributions) carve it into its own non-core-perl package
that's not installed along with /usr/bin/perl by default. Without this
we'll hard fail the gitweb tests when trying to load the module.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/gitweb-lib.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 59ef15efbd..b7a73874e7 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 || {
test_done
 }
 
+perl -mTime::HiRes -e 0  >/dev/null 2>&1 || {
+   skip_all='skipping gitweb tests, Time::HiRes module not available'
+   test_done
+}
+
 gitweb_init
-- 
2.11.0



Re: [PATCH] gitweb tests: Skip tests when we don't have Time::HiRes

2017-02-27 Thread Ævar Arnfjörð Bjarmason
On Mon, Feb 27, 2017 at 6:48 PM, Jakub Narębski  wrote:
> W dniu 27.02.2017 o 13:37, Ævar Arnfjörð Bjarmason pisze:
>> Change the gitweb tests to skip when we can't load the Time::HiRes
>> module.
>
> Could you tell us in the commit message why this module is needed?
> Is it because gitweb loads it unconditionally, or does that at least
> in the default configuration, or is it used in tests, or...?
>
> [I see it is somewhat addressed below]
>
>>
>> This module has bee in perl core since v5.8, which is the oldest
>
> s/bee/been/

I'll clarify that in a re-roll & fix the typo, pending any other
comments. Thanks!

>> version we support, however CentOS (and perhaps some other
>> distributions) carve it into its own non-core-perl package that's not
>> installed along with /usr/bin/perl by default. Without this we'll hard
>> fail the gitweb tests when trying to load the module.
>
> I see that it because gitweb.perl as the following at line 20:
>
> use Time::HiRes qw(gettimeofday tv_interval);
>
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>
> Good catch (if a strange one...).

This and the associated cvs tests failing as root patch I submitted
were discovered when trying to build git in the standard mock build
environment on CentOS. It creates a chroot and rpm installs just the
packages you declare, so issues like these crop up.

>> ---
>>  t/gitweb-lib.sh | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
>> index d5dab5a94f..116c3890e6 100644
>> --- a/t/gitweb-lib.sh
>> +++ b/t/gitweb-lib.sh
>> @@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 
>> || {
>>   test_done
>>  }
>>
>> +perl -mTime::HiRes -e 0  >/dev/null 2>&1 || {
>> + skip_all='skipping gitweb tests, Time::HiRes module unusable'
>
> Is "unusable" a good description, instead of "not found"?

Yeah it's odd, but I just copied the several lines above that use that phrasing.

>> + test_done
>> +}
>> +
>>  gitweb_init
>>
>


Re: [PATCH] gitweb tests: Skip tests when we don't have Time::HiRes

2017-02-27 Thread Jakub Narębski
W dniu 27.02.2017 o 13:37, Ævar Arnfjörð Bjarmason pisze:
> Change the gitweb tests to skip when we can't load the Time::HiRes
> module.

Could you tell us in the commit message why this module is needed?
Is it because gitweb loads it unconditionally, or does that at least
in the default configuration, or is it used in tests, or...?

[I see it is somewhat addressed below]

> 
> This module has bee in perl core since v5.8, which is the oldest

s/bee/been/

> version we support, however CentOS (and perhaps some other
> distributions) carve it into its own non-core-perl package that's not
> installed along with /usr/bin/perl by default. Without this we'll hard
> fail the gitweb tests when trying to load the module.

I see that it because gitweb.perl as the following at line 20:

use Time::HiRes qw(gettimeofday tv_interval);

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 

Good catch (if a strange one...).

> ---
>  t/gitweb-lib.sh | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
> index d5dab5a94f..116c3890e6 100644
> --- a/t/gitweb-lib.sh
> +++ b/t/gitweb-lib.sh
> @@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 
> || {
>   test_done
>  }
>  
> +perl -mTime::HiRes -e 0  >/dev/null 2>&1 || {
> + skip_all='skipping gitweb tests, Time::HiRes module unusable'

Is "unusable" a good description, instead of "not found"?

> + test_done
> +}
> +
>  gitweb_init
> 



[PATCH] gitweb tests: Skip tests when we don't have Time::HiRes

2017-02-27 Thread Ævar Arnfjörð Bjarmason
Change the gitweb tests to skip when we can't load the Time::HiRes
module.

This module has bee in perl core since v5.8, which is the oldest
version we support, however CentOS (and perhaps some other
distributions) carve it into its own non-core-perl package that's not
installed along with /usr/bin/perl by default. Without this we'll hard
fail the gitweb tests when trying to load the module.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/gitweb-lib.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index d5dab5a94f..116c3890e6 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -114,4 +114,9 @@ perl -MCGI -MCGI::Util -MCGI::Carp -e 0 >/dev/null 2>&1 || {
test_done
 }
 
+perl -mTime::HiRes -e 0  >/dev/null 2>&1 || {
+   skip_all='skipping gitweb tests, Time::HiRes module unusable'
+   test_done
+}
+
 gitweb_init
-- 
2.11.0



Re: gitweb html validation

2016-11-18 Thread Ralf Thielow
2016-11-15 19:26 GMT+01:00 Ralf Thielow :

Finally I've found the time to actually try this out and there
are some problems with it.

>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7cf68f07b..33d7c154f 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -5531,8 +5531,8 @@ sub git_project_search_form {
> $limit = " in '$project_filter/'";
> }
>
> -   print "\n";
> print $cgi->start_form(-method => 'get', -action => $my_uri) .
> + "\n" .
>   $cgi->hidden(-name => 'a', -value => 'project_list')  . "\n";
> print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n"
> if (defined $project_filter);
> @@ -5544,11 +5544,11 @@ sub git_project_search_form {
>  -checked => $search_use_regexp) .
>   "\n" .
>   $cgi->submit(-name => 'btnS', -value => 'Search') .
> - $cgi->end_form() . "\n" .
>   $cgi->a({-href => href(project => undef, searchtext => undef,
>  project_filter => $project_filter)},
>   esc_html("List all projects$limit")) . "\n";
> -   print "\n";
> +   print "\n" .
> + $cgi->end_form() . "\n";
>  }
>

The anchor is now inside the form-tag, which means there is no
visual line-break anymore that comes automatically after 
as the form-tag is a block level element.  Could be solved by adding a
"", which is not very nice, but OK.

>  # entry for given @keys needs filling if at least one of keys in list
> diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
> index 321260103..507740b6a 100644
> --- a/gitweb/static/gitweb.css
> +++ b/gitweb/static/gitweb.css
> @@ -539,7 +539,7 @@ div.projsearch {
> margin: 20px 0px;
>  }
>
> -div.projsearch form {
> +form div.projsearch {
> margin-bottom: 2px;
>  }
>

This is wrong as it overwrites the setting above, 20px at the bottom
went to 2px.

The problem is how to apply the 2px now. Before this, we had the
, a block element, which we can give the 2px margin at the
bottom.  Now this element is gone and we have a set of inline
elements where we use "" to emulate the line break.  There
are two css rules which can solve this, but I'm not really happy with
both of them.
1) As we know we need the two pixel below an input element, we
can say
div.projsearch input {
  margin-bottom: 2px;
}
2) Make the a-Tag inside div.projsearch being displayed as a block
element to make a margin-top setting working.  This has the benefit
that we don't care about the element above.
div.projsearch a {
display: inline-block;
margin-top: 2px;
}
If I have to choose I'd prefer the second one.

So I can't think of a way to solve this nicely with this change.


Re: gitweb html validation

2016-11-16 Thread Raphaël Gertz

Le 16.11.2016 01:05, Junio C Hamano a écrit :

Ralf Thielow  writes:


Only block level elements are
allowed to be inside form tags, according to
https://www.w3.org/2010/04/xhtml10-strict.html#elem_form
...
I think it's better to just move the -Tag outside of the
surrounding div?
Something like this perhaps, I didn't test it myself yet.


That sounds like a sensible update to me (no, I do not run gitweb
myself).  Is this the only  we have in the UI, or is it the
only one that is problematic?


There is an other form in the cgi line 4110 :
print $cgi->start_form(-method => "get", -action => $action) .
  "\n" .

But this one has a  inside.

The problem with projsearch I want to change is that the div is around 
the form without a container inside.


I agree with moving the  inside the form if it's 
a better option.


Best regards


Re: gitweb html validation

2016-11-15 Thread Junio C Hamano
Ralf Thielow  writes:

> Only block level elements are
> allowed to be inside form tags, according to
> https://www.w3.org/2010/04/xhtml10-strict.html#elem_form
> ...
> I think it's better to just move the -Tag outside of the
> surrounding div?
> Something like this perhaps, I didn't test it myself yet.

That sounds like a sensible update to me (no, I do not run gitweb
myself).  Is this the only  we have in the UI, or is it the
only one that is problematic?

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7cf68f07b..33d7c154f 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -5531,8 +5531,8 @@ sub git_project_search_form {
>   $limit = " in '$project_filter/'";
>   }
>  
> - print "\n";
>   print $cgi->start_form(-method => 'get', -action => $my_uri) .
> +   "\n" .
> $cgi->hidden(-name => 'a', -value => 'project_list')  . "\n";
>   print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n"
>   if (defined $project_filter);
> @@ -5544,11 +5544,11 @@ sub git_project_search_form {
>-checked => $search_use_regexp) .
> "\n" .
> $cgi->submit(-name => 'btnS', -value => 'Search') .
> -   $cgi->end_form() . "\n" .
> $cgi->a({-href => href(project => undef, searchtext => undef,
>    project_filter => $project_filter)},
> esc_html("List all projects$limit")) . "\n";
> - print "\n";
> + print "\n" .
> +   $cgi->end_form() . "\n";
>  }
>  
>  # entry for given @keys needs filling if at least one of keys in list
> diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
> index 321260103..507740b6a 100644
> --- a/gitweb/static/gitweb.css
> +++ b/gitweb/static/gitweb.css
> @@ -539,7 +539,7 @@ div.projsearch {
>   margin: 20px 0px;
>  }
>  
> -div.projsearch form {
> +form div.projsearch {
>   margin-bottom: 2px;
>  }
>  


Re: gitweb html validation

2016-11-15 Thread Ralf Thielow
Raphaël Gertz  wrote:
> Hi,
> 
> There a small bug in gitweb html validation, you need the following patch to
> pass w3c check with searchbox enabled.
> 
> The problem lies in the input directly embed inside a form without a wrapper
> which is not valid.
>

I agree this is a small bug. Only block level elements are
allowed to be inside form tags, according to
https://www.w3.org/2010/04/xhtml10-strict.html#elem_form

> Best regards
> 
> The following patch fix the issue for git-2.10.2 :
> --- /usr/share/gitweb/gitweb.cgi.orig   2016-11-15 15:37:21.149805026 +0100
> +++ /usr/share/gitweb/gitweb.cgi2016-11-15 15:37:48.579240429 +0100
> @@ -5518,6 +5518,7 @@ sub git_project_search_form {
> 
> print "\n";
> print $cgi->start_form(-method => 'get', -action => $my_uri) .
> + ''.
>   $cgi->hidden(-name => 'a', -value => 'project_list')  . "\n";
> print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n"
> if (defined $project_filter);
> @@ -5529,6 +5530,7 @@ sub git_project_search_form {
>  -checked => $search_use_regexp) .
>   "\n" .
>   $cgi->submit(-name => 'btnS', -value => 'Search') .
> + ''.
>   $cgi->end_form() . "\n" .
>   $cgi->a({-href => href(project => undef, searchtext => undef,
>      project_filter => $project_filter)},

I think it's better to just move the -Tag outside of the
surrounding div?
Something like this perhaps, I didn't test it myself yet.

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7cf68f07b..33d7c154f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -5531,8 +5531,8 @@ sub git_project_search_form {
$limit = " in '$project_filter/'";
}
 
-   print "\n";
print $cgi->start_form(-method => 'get', -action => $my_uri) .
+ "\n" .
  $cgi->hidden(-name => 'a', -value => 'project_list')  . "\n";
print $cgi->hidden(-name => 'pf', -value => $project_filter). "\n"
if (defined $project_filter);
@@ -5544,11 +5544,11 @@ sub git_project_search_form {
 -checked => $search_use_regexp) .
  "\n" .
  $cgi->submit(-name => 'btnS', -value => 'Search') .
- $cgi->end_form() . "\n" .
  $cgi->a({-href => href(project => undef, searchtext => undef,
 project_filter => $project_filter)},
  esc_html("List all projects$limit")) . "\n";
-   print "\n";
+   print "\n" .
+ $cgi->end_form() . "\n";
 }
 
 # entry for given @keys needs filling if at least one of keys in list
diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
index 321260103..507740b6a 100644
--- a/gitweb/static/gitweb.css
+++ b/gitweb/static/gitweb.css
@@ -539,7 +539,7 @@ div.projsearch {
margin: 20px 0px;
 }
 
-div.projsearch form {
+form div.projsearch {
margin-bottom: 2px;
 }
 



gitweb html validation

2016-11-15 Thread Raphaël Gertz

Hi,

There a small bug in gitweb html validation, you need the following 
patch to pass w3c check with searchbox enabled.


The problem lies in the input directly embed inside a form without a 
wrapper which is not valid.


Best regards

The following patch fix the issue for git-2.10.2 :
--- /usr/share/gitweb/gitweb.cgi.orig   2016-11-15 15:37:21.149805026 
+0100
+++ /usr/share/gitweb/gitweb.cgi2016-11-15 15:37:48.579240429 
+0100

@@ -5518,6 +5518,7 @@ sub git_project_search_form {

print "\n";
print $cgi->start_form(-method => 'get', -action => $my_uri) .
+ ''.
  $cgi->hidden(-name => 'a', -value => 'project_list')  . 
"\n";
print $cgi->hidden(-name => 'pf', -value => $project_filter). 
"\n"

if (defined $project_filter);
@@ -5529,6 +5530,7 @@ sub git_project_search_form {
 -checked => $search_use_regexp) .
  "\n" .
  $cgi->submit(-name => 'btnS', -value => 'Search') .
+ ''.
  $cgi->end_form() . "\n" .
  $cgi->a({-href => href(project => undef, searchtext => 
undef,
 project_filter => 
$project_filter)},


Re: Forbid access to /gitweb but authorize the sub projets

2016-11-09 Thread Dennis Kaarsemaker
On Mon, 2016-11-07 at 14:07 +0100, Alexandre Duplaix wrote:
> Hello,
> 
> I have several projects under https://myserver/gitweb and I would like 
> to forbid the access to the root, so that the users can't list the 
> differents projects.
> 
> However, I need to let the access to the sub projects (ex: 
> https://myserver/gitweb/?p=project1;a=summary
> 
> How can I do please ?

My favourite way of doing this is abusing the fact that gitweb.conf is
perl code that's loaded with do $filename.

This makes it easy to override such things. Try this in gitweb.conf for example:

sub no_index {
die_error(403, "No access to the repository list");
}
$actions{project_list} = \&no_index;
$actions{project_index} = \&no_index;
$actions{opml} = \&no_index;

-- 
Dennis Kaarsemaker
http://www.kaarsemaker.net


Forbid access to /gitweb but authorize the sub projets

2016-11-07 Thread Alexandre Duplaix

Hello,

I have several projects under https://myserver/gitweb and I would like 
to forbid the access to the root, so that the users can't list the 
differents projects.


However, I need to let the access to the sub projects (ex: 
https://myserver/gitweb/?p=project1;a=summary


How can I do please ?

Thank you in advance :)

" Ce courriel et les documents qui lui sont joints peuvent contenir des
informations confidentielles ou ayant un caractère privé. 
S'ils ne vous sont pas destinés nous vous signalons qu'il est strictement interdit de les

divulguer, de les reproduire ou d'en utiliser de quelque mani�re que ce
soit le contenu. Si ce message vous a �t� transmis par erreur, merci d'en
informer l'exp�diteur et de supprimer imm�diatement de votre syst�me
informatique ce courriel ainsi que tous les documents qui y sont attach�s"

  **

" This e-mail and any attached documents may contain confidential or
proprietary information. If you are not the intended recipient, you are
notified that any dissemination, copying of this e-mail and any attachments
thereto or use of their contents by any means whatsoever is strictly
prohibited. If you have received this e-mail in error, please advise the
sender immediately and delete this e-mail and all attached documents
from your computer system."
#



Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

2016-10-17 Thread Ævar Arnfjörð Bjarmason
On Mon, Oct 17, 2016 at 6:54 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason  writes:
>
>> As far as I can tell the only outstanding "change this" is your
>> s/SHA1/SHA-1/ in , do
>> you want to fix that up or should I submit another series?
>
> I think I did that already myself while queuing.  Could you fetch
> what I queued on 'pu' to double check?

Thanks, looked at it, looks good to me!
> I think the diff between what was posted and what is queued (I just
> checked) looks like this:
>
> -gitweb: Link to 7-char+ SHA1s, not only 8-char+
> +gitweb: link to 7-char+ SHA-1s, not only 8-char+
>
>  Change the minimum length of an abbreviated object identifier in the
>  commit message gitweb tries to turn into link from 8 hexchars to 7.
>
> @@ -5,16 +12,18 @@
>  SHA-1 in commit log message links to "object" view", 2006-12-10), but
>  the default abbreviation length is 7, and has been for a long time.
>
> -It's still possible to reference SHA1s down to 4 characters in length,
> +It's still possible to reference SHA-1s down to 4 characters in length,
>  see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
>  git actually produce that, so I doubt anyone is putting that into log
> -messages in practice, but people definitely do put 7 character SHA1s
> +messages in practice, but people definitely do put 7 character SHA-1s
>  into log messages.
>
>  I think it's fairly dubious to link to things matching [0-9a-fA-F]
>  here as opposed to just [0-9a-f], that dates back to the initial
>  version of gitweb from 161332a ("first working version",
> -2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
> +2005-08-07). Git will accept all-caps SHA-1s, but didn't ever produce
>  them as far as I can tell.
>
>  Signed-off-by: Ævar Arnfjörð Bjarmason 
> +Acked-by: Jakub Narębski 
> +Signed-off-by: Junio C Hamano 


Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

2016-10-17 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> As far as I can tell the only outstanding "change this" is your
> s/SHA1/SHA-1/ in , do
> you want to fix that up or should I submit another series?

I think I did that already myself while queuing.  Could you fetch
what I queued on 'pu' to double check?

I think the diff between what was posted and what is queued (I just
checked) looks like this:

-gitweb: Link to 7-char+ SHA1s, not only 8-char+
+gitweb: link to 7-char+ SHA-1s, not only 8-char+

 Change the minimum length of an abbreviated object identifier in the
 commit message gitweb tries to turn into link from 8 hexchars to 7.
 
@@ -5,16 +12,18 @@
 SHA-1 in commit log message links to "object" view", 2006-12-10), but
 the default abbreviation length is 7, and has been for a long time.
 
-It's still possible to reference SHA1s down to 4 characters in length,
+It's still possible to reference SHA-1s down to 4 characters in length,
 see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
 git actually produce that, so I doubt anyone is putting that into log
-messages in practice, but people definitely do put 7 character SHA1s
+messages in practice, but people definitely do put 7 character SHA-1s
 into log messages.
 
 I think it's fairly dubious to link to things matching [0-9a-fA-F]
 here as opposed to just [0-9a-f], that dates back to the initial
 version of gitweb from 161332a ("first working version",
-2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
+2005-08-07). Git will accept all-caps SHA-1s, but didn't ever produce
 them as far as I can tell.
 
 Signed-off-by: Ævar Arnfjörð Bjarmason 
+Acked-by: Jakub Narębski 
+Signed-off-by: Junio C Hamano 


Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

2016-10-15 Thread Ævar Arnfjörð Bjarmason
On Fri, Oct 14, 2016 at 8:40 PM, Junio C Hamano  wrote:
> Jakub Narębski  writes:
>
>> s/SHA1/SHA-1/g in above paragraph (for correctness and consistency).
>>>
>>> I think it's fairly dubious to link to things matching [0-9a-fA-F]
>>> here as opposed to just [0-9a-f], that dates back to the initial
>>> version of gitweb from 161332a ("first working version",
>>> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
>>> them as far as I can tell.
>>
>> All right.  If we decide to be more strict in what we accept, we can
>> do it in a separate commit.
>>
>>>
>>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>>
>> Acked-by: Jakub Narębski 
>
> Thanks for a review.  As the topic is not yet in 'next', I'll squish
> in your Acked-by: to them.  I saw them only for 1 & 2/3; would
> another for 3/3 be coming soon?

As far as I can tell the only outstanding "change this" is your
s/SHA1/SHA-1/ in , do
you want to fix that up or should I submit another series?

>>
>>> ---
>>>  gitweb/gitweb.perl | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>>> index cba7405..92b5e91 100755
>>> --- a/gitweb/gitweb.perl
>>> +++ b/gitweb/gitweb.perl
>>> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>>>  my $line = shift;
>>>
>>>  $line = esc_html($line, -nbsp=>1);
>>> -$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
>>> +$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>>
>> By the way, it is quite long commit message for one character change.
>> Not that it is a bad thing...
>>
>>>  $cgi->a({-href => href(action=>"object", hash=>$1),
>>>  -class => "text"}, $1);
>>>  }eg;
>>>


Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-14 Thread Jakub Narębski
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze:
> Change the log formatting function to know about "git describe" output
> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
> 
> There are still many valid refnames that we don't link to
> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
> similarly it's trivially possible to create some refnames like
> "æ/var-gf6727b0" or which won't be picked up by this regex.

It's important to cover most common cases occurring naturally in
people's repositories, while trying to avoid false positives (the latter
is important more now, where gitweb doesn't check for rev name validity).

I guess that most common is to use non-hierarchical tags with ASCII-only
names for describe output, so while "æ/var-gf6727b0" won't be picked,
it is IMVHO also much less likely to occur.

> 
> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.
> 
> There was on-list discussion about how we could do better than this
> patch. Junio suggested to update parse_commits() to call a new
> "gitweb--helper" command which would pass each of the revision
> candidates through "rev-parse --verify --quiet". That would cut down
> on our false positives (e.g. we'll link to "deadbeef"), and also allow
> us to be more aggressive in selecting candidate revisions.
> 
> That may be too expensive to work in practice, or it may
> not. Investigating that would be a good follow-up to this patch.

All right.  That's good and enough for one patch.

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 

Acked-by: Jakub Narębski 

BTW. to add to whole "git describe" output or not discussion: having link
span whole revision marker makes for easier to use UI: larger are to hit.

> ---
>  gitweb/gitweb.perl | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 92b5e91..7cf68f0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> + $line =~ s{
> +\b
> +(
> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +# or hadoop-20160921-113441-20-g094fb7d
> +(? +[A-Za-z0-9.-]+
> +(?!\.) # refs can't end with ".", see check_refname_format()
> +-g[0-9a-fA-F]{7,40}
> +|
> +# Just a normal looking Git SHA1
> +[0-9a-fA-F]{7,40}
> +)
> +\b

Nice using of eXplained regexp.  It is much easier to understand with
comments.

> +}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
> - }eg;
> + }egx;
>  
>   return $line;
>  }
> 



Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-14 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I just ran into an example of a better reason for doing it like my
> patch is doing, which is that if you have some tag like:
>
> deployment-20160928-171914-16-g42e13d8
>
> With my patch the whole thing will be a link to the 42e13d8 commit,
> but with this suggestion both 20160928 and 42e13d8 would become commit
> links, the former one would be broken.
>
> Of course we could have some code that would detect that the whole \S+
> is part of one thing that ends in g,...

I think that this example shows a flaw not in the "suffix that looks
like an object name" approach, but more in the boundary regexp,
namely, the \b part; it is probably too loose.

And \S+ is not the right cue, either, for that matter.  IOW, "we
only should take hexstring, optionally prefixed with 'g', that
appears before the whitespace" is too strict, as a sentence

We broke the system with deployment-g42e13d8.

does want to link to 42e13d8, even though full-stop at the end is
not whitespace, and the existing regexp uses \b there as a rough
equivalent to saying "Here must be a whitespace or punctuation".

An attempt to tighten "what a punctuation is" by excluding '-' may
make that "timestamp is in the tagname" example work, but is not a
good solution, either, because two sentences can be concatenated
with an em-dash that is often typed as two hyphen in plain text,
resulting in something like

We broke the system with deployment-g42e13d8--sigh.

and we do want to treat the '-' after 42e13d8 as a punctuation after
a described object name.

So I agree 3/3 is good thing to do as-is.

Thanks.



Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

2016-10-14 Thread Junio C Hamano
Jakub Narębski  writes:

> s/SHA1/SHA-1/g in above paragraph (for correctness and consistency).
>> 
>> I think it's fairly dubious to link to things matching [0-9a-fA-F]
>> here as opposed to just [0-9a-f], that dates back to the initial
>> version of gitweb from 161332a ("first working version",
>> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
>> them as far as I can tell.
>
> All right.  If we decide to be more strict in what we accept, we can
> do it in a separate commit.
>
>> 
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>
> Acked-by: Jakub Narębski 

Thanks for a review.  As the topic is not yet in 'next', I'll squish
in your Acked-by: to them.  I saw them only for 1 & 2/3; would
another for 3/3 be coming soon?

>
>> ---
>>  gitweb/gitweb.perl | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index cba7405..92b5e91 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>>  my $line = shift;
>>  
>>  $line = esc_html($line, -nbsp=>1);
>> -$line =~ s{\b([0-9a-fA-F]{8,40})\b}{
>> +$line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>
> By the way, it is quite long commit message for one character change.
> Not that it is a bad thing...
>
>>  $cgi->a({-href => href(action=>"object", hash=>$1),
>>  -class => "text"}, $1);
>>  }eg;
>> 


Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-14 Thread Ævar Arnfjörð Bjarmason
On Sun, Oct 9, 2016 at 1:20 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Thu, Oct 6, 2016 at 9:51 PM, Junio C Hamano  wrote:
>> Ævar Arnfjörð Bjarmason   writes:
>>
>>> Change the log formatting function to know about "git describe" output
>>> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>>>
>>> There are still many valid refnames that we don't link to
>>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
>>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
>>> similarly it's trivially possible to create some refnames like
>>> "æ/var-gf6727b0" or which won't be picked up by this regex.
>>
>> Not a serious counter-proposal or suggestion, and certainly not an
>> objection to the patch I am responding to, but I wonder if it is so
>> bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.
>>
>> IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
>> allowed an optional 'g' in front of the hex, e.g.
>>
>> -   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>> +   $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{
>>
>> wouldn't that be much simpler, covers more cases and sufficient?
>
> It would be more simpler, maybe that's the right approach. I have a
> preference for making the entire reference a link. I think it makes
> more sense for the UI.

I just ran into an example of a better reason for doing it like my
patch is doing, which is that if you have some tag like:

deployment-20160928-171914-16-g42e13d8

 With my patch the whole thing will be a link to the 42e13d8 commit,
but with this suggestion both 20160928 and 42e13d8 would become commit
links, the former one would be broken.

Of course we could have some code that would detect that the whole \S+
is part of one thing that ends in g, but the complexity in
doing that would be equivalent or more to doing what my patch is
doing.

>>> There's surely room for improvement here, but I just wanted to address
>>> the very common case of sticking "git describe" output into commit
>>> messages without trying to link to all possible refnames, that's going
>>> to be a rather futile exercise given that this is free text, and it
>>> would be prohibitively expensive to look up whether the references in
>>> question exist in our repository.
>>>
>>> There was on-list discussion about how we could do better than this
>>> patch. Junio suggested to update parse_commits() to call a new
>>> "gitweb--helper" command which would pass each of the revision
>>> candidates through "rev-parse --verify --quiet". That would cut down
>>> on our false positives (e.g. we'll link to "deadbeef"), and also allow
>>> us to be more aggressive in selecting candidate revisions.
>>>
>>> That may be too expensive to work in practice, or it may
>>> not. Investigating that would be a good follow-up to this patch.
>>>
>>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>>> ---
>>>  gitweb/gitweb.perl | 18 --
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>>> index 92b5e91..7cf68f0 100755
>>> --- a/gitweb/gitweb.perl
>>> +++ b/gitweb/gitweb.perl
>>> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>>>   my $line = shift;
>>>
>>>   $line = esc_html($line, -nbsp=>1);
>>> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>>> + $line =~ s{
>>> +\b
>>> +(
>>> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
>>> +# or hadoop-20160921-113441-20-g094fb7d
>>> +(?>> +[A-Za-z0-9.-]+
>>> +(?!\.) # refs can't end with ".", see check_refname_format()
>>> +-g[0-9a-fA-F]{7,40}
>>> +|
>>> +# Just a normal looking Git SHA1
>>> +[0-9a-fA-F]{7,40}
>>> +)
>>> +\b
>>> +}{
>>>   $cgi->a({-href => href(action=>"object", hash=>$1),
>>>   -class => "text"}, $1);
>>> - }eg;
>>> + }egx;
>>>
>>>   return $line;
>>>  }


Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

2016-10-14 Thread Jakub Narębski
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason pisze:

> Change the minimum length of an abbreviated object identifier in the
> commit message gitweb tries to turn into link from 8 hexchars to 7.
> 
> This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
> SHA-1 in commit log message links to "object" view", 2006-12-10), but
> the default abbreviation length is 7, and has been for a long time.
> 
> It's still possible to reference SHA1s down to 4 characters in length,
> see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
> git actually produce that, so I doubt anyone is putting that into log
> messages in practice, but people definitely do put 7 character SHA1s
> into log messages.

That's nice explanation why we want to support 7 character SHA-1
(it is the default, and was seen in the wild), but don't need to
support shorter lengths.

Another reason (just for completeness; you don't need to put it in
the commit message) to not go below 7 characters is that with 
decreasing length there is more and more chance for false positives
(like 'beef' or 'caffee' for 4-char or 5-char hexstring), as I wrote
previously.

s/SHA1/SHA-1/g in above paragraph (for correctness and consistency).

> 
> I think it's fairly dubious to link to things matching [0-9a-fA-F]
> here as opposed to just [0-9a-f], that dates back to the initial
> version of gitweb from 161332a ("first working version",
> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
> them as far as I can tell.

All right.  If we decide to be more strict in what we accept, we can
do it in a separate commit.

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 

Acked-by: Jakub Narębski 

> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index cba7405..92b5e91 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> + $line =~ s{\b([0-9a-fA-F]{7,40})\b}{

By the way, it is quite long commit message for one character change.
Not that it is a bad thing...

>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
>   }eg;
> 



Re: [PATCH v2 1/3] gitweb: Fix a typo in a comment

2016-10-14 Thread Jakub Narębski
W dniu 06.10.2016 o 11:11, Ævar Arnfjörð Bjarmason napisał:

> Change a typo'd MIME type in a comment. The Content-Type is
> application/xhtml+xml, not application/xhtm+xml.
> 
> Fixes up code originally added in 53c4031 ("gitweb: Strip
> non-printable characters from syntax highlighter output", 2011-09-16).
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 

Good.  Thanks for taking care of this.
For what is worth for such a trivial patch:

Acked-by: Jakub Narębski 

> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 44094f4..cba7405 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1616,7 +1616,7 @@ sub esc_path {
>   return $str;
>  }
>  
> -# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
> +# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
>  sub sanitize {
>   my $str = shift;
>  
> 



Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-09 Thread Ævar Arnfjörð Bjarmason
On Thu, Oct 6, 2016 at 9:51 PM, Junio C Hamano  wrote:
> Ævar Arnfjörð Bjarmason   writes:
>
>> Change the log formatting function to know about "git describe" output
>> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>>
>> There are still many valid refnames that we don't link to
>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
>> similarly it's trivially possible to create some refnames like
>> "æ/var-gf6727b0" or which won't be picked up by this regex.
>
> Not a serious counter-proposal or suggestion, and certainly not an
> objection to the patch I am responding to, but I wonder if it is so
> bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.
>
> IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
> allowed an optional 'g' in front of the hex, e.g.
>
> -   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> +   $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{
>
> wouldn't that be much simpler, covers more cases and sufficient?

It would be more simpler, maybe that's the right approach. I have a
preference for making the entire reference a link. I think it makes
more sense for the UI.

>> There's surely room for improvement here, but I just wanted to address
>> the very common case of sticking "git describe" output into commit
>> messages without trying to link to all possible refnames, that's going
>> to be a rather futile exercise given that this is free text, and it
>> would be prohibitively expensive to look up whether the references in
>> question exist in our repository.
>>
>> There was on-list discussion about how we could do better than this
>> patch. Junio suggested to update parse_commits() to call a new
>> "gitweb--helper" command which would pass each of the revision
>> candidates through "rev-parse --verify --quiet". That would cut down
>> on our false positives (e.g. we'll link to "deadbeef"), and also allow
>> us to be more aggressive in selecting candidate revisions.
>>
>> That may be too expensive to work in practice, or it may
>> not. Investigating that would be a good follow-up to this patch.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>> ---
>>  gitweb/gitweb.perl | 18 --
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 92b5e91..7cf68f0 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>>   my $line = shift;
>>
>>   $line = esc_html($line, -nbsp=>1);
>> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>> + $line =~ s{
>> +\b
>> +(
>> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
>> +# or hadoop-20160921-113441-20-g094fb7d
>> +(?> +[A-Za-z0-9.-]+
>> +(?!\.) # refs can't end with ".", see check_refname_format()
>> +-g[0-9a-fA-F]{7,40}
>> +|
>> +# Just a normal looking Git SHA1
>> +[0-9a-fA-F]{7,40}
>> +)
>> +\b
>> +}{
>>   $cgi->a({-href => href(action=>"object", hash=>$1),
>>   -class => "text"}, $1);
>> - }eg;
>> + }egx;
>>
>>   return $line;
>>  }


Re: [PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-06 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Change the log formatting function to know about "git describe" output
> such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
>
> There are still many valid refnames that we don't link to
> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
> similarly it's trivially possible to create some refnames like
> "æ/var-gf6727b0" or which won't be picked up by this regex.

Not a serious counter-proposal or suggestion, and certainly not an
objection to the patch I am responding to, but I wonder if it is so
bad if we made the 867ad08 into link when showing v2.8.0-4-g867ad08.

IOW, in addition to \b followed by [0-9a-f]+ followed by \b, if we
allowed an optional 'g' in front of the hex, e.g.

-   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
+   $line =~ s{\bg?([0-9a-fA-F]{7,40})\b}{

wouldn't that be much simpler, covers more cases and sufficient?

> There's surely room for improvement here, but I just wanted to address
> the very common case of sticking "git describe" output into commit
> messages without trying to link to all possible refnames, that's going
> to be a rather futile exercise given that this is free text, and it
> would be prohibitively expensive to look up whether the references in
> question exist in our repository.
>
> There was on-list discussion about how we could do better than this
> patch. Junio suggested to update parse_commits() to call a new
> "gitweb--helper" command which would pass each of the revision
> candidates through "rev-parse --verify --quiet". That would cut down
> on our false positives (e.g. we'll link to "deadbeef"), and also allow
> us to be more aggressive in selecting candidate revisions.
>
> That may be too expensive to work in practice, or it may
> not. Investigating that would be a good follow-up to this patch.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  gitweb/gitweb.perl | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 92b5e91..7cf68f0 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
> + $line =~ s{
> +\b
> +(
> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
> +# or hadoop-20160921-113441-20-g094fb7d
> +(? +[A-Za-z0-9.-]+
> +(?!\.) # refs can't end with ".", see check_refname_format()
> +-g[0-9a-fA-F]{7,40}
> +|
> +# Just a normal looking Git SHA1
> +[0-9a-fA-F]{7,40}
> +)
> +\b
> +}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
> - }eg;
> + }egx;
>  
>   return $line;
>  }


Re: [PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

2016-10-06 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Change the minimum length of an abbreviated object identifier in the
> commit message gitweb tries to turn into link from 8 hexchars to 7.
>
> This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
> SHA-1 in commit log message links to "object" view", 2006-12-10), but
> the default abbreviation length is 7, and has been for a long time.
>
> It's still possible to reference SHA1s down to 4 characters in length,
> see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
> git actually produce that, so I doubt anyone is putting that into log
> messages in practice, but people definitely do put 7 character SHA1s
> into log messages.
>
> I think it's fairly dubious to link to things matching [0-9a-fA-F]
> here as opposed to just [0-9a-f], that dates back to the initial
> version of gitweb from 161332a ("first working version",
> 2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
> them as far as I can tell.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---

s/SHA1/SHA-1/g.  I agree that A-F are dubious.

>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index cba7405..92b5e91 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2036,7 +2036,7 @@ sub format_log_line_html {
>   my $line = shift;
>  
>   $line = esc_html($line, -nbsp=>1);
> - $line =~ s{\b([0-9a-fA-F]{8,40})\b}{
> + $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>   $cgi->a({-href => href(action=>"object", hash=>$1),
>   -class => "text"}, $1);
>   }eg;


[PATCH v2 2/3] gitweb: Link to 7-char+ SHA1s, not only 8-char+

2016-10-06 Thread Ævar Arnfjörð Bjarmason
Change the minimum length of an abbreviated object identifier in the
commit message gitweb tries to turn into link from 8 hexchars to 7.

This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
SHA-1 in commit log message links to "object" view", 2006-12-10), but
the default abbreviation length is 7, and has been for a long time.

It's still possible to reference SHA1s down to 4 characters in length,
see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
git actually produce that, so I doubt anyone is putting that into log
messages in practice, but people definitely do put 7 character SHA1s
into log messages.

I think it's fairly dubious to link to things matching [0-9a-fA-F]
here as opposed to just [0-9a-f], that dates back to the initial
version of gitweb from 161332a ("first working version",
2005-08-07). Git will accept all-caps SHA1s, but didn't ever produce
them as far as I can tell.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cba7405..92b5e91 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2036,7 +2036,7 @@ sub format_log_line_html {
my $line = shift;
 
$line = esc_html($line, -nbsp=>1);
-   $line =~ s{\b([0-9a-fA-F]{8,40})\b}{
+   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
$cgi->a({-href => href(action=>"object", hash=>$1),
-class => "text"}, $1);
}eg;
-- 
2.9.3



[PATCH v2 0/3] gitweb: Be smarter about linking to SHA1s in log messages

2016-10-06 Thread Ævar Arnfjörð Bjarmason
This is v2 of patches I sent on September 21st starting at
<20160921114428.28664-1-ava...@gmail.com>. Jakub Narębski had a lot of
feedback for that series (thanks!). Which as far as I can tell I've
incorporated entirely in this re-roll.

Ævar Arnfjörð Bjarmason (3):
  gitweb: Fix a typo in a comment
  gitweb: Link to 7-char+ SHA1s, not only 8-char+
  gitweb: Link to "git describe"'d commits in log messages

 gitweb/gitweb.perl | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

-- 
2.9.3



[PATCH v2 1/3] gitweb: Fix a typo in a comment

2016-10-06 Thread Ævar Arnfjörð Bjarmason
Change a typo'd MIME type in a comment. The Content-Type is
application/xhtml+xml, not application/xhtm+xml.

Fixes up code originally added in 53c4031 ("gitweb: Strip
non-printable characters from syntax highlighter output", 2011-09-16).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 44094f4..cba7405 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1616,7 +1616,7 @@ sub esc_path {
return $str;
 }
 
-# Sanitize for use in XHTML + application/xml+xhtm (valid XML 1.0)
+# Sanitize for use in XHTML + application/xml+xhtml (valid XML 1.0)
 sub sanitize {
my $str = shift;
 
-- 
2.9.3



[PATCH v2 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-10-06 Thread Ævar Arnfjörð Bjarmason
Change the log formatting function to know about "git describe" output
such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".

There are still many valid refnames that we don't link to
e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
similarly it's trivially possible to create some refnames like
"æ/var-gf6727b0" or which won't be picked up by this regex.

There's surely room for improvement here, but I just wanted to address
the very common case of sticking "git describe" output into commit
messages without trying to link to all possible refnames, that's going
to be a rather futile exercise given that this is free text, and it
would be prohibitively expensive to look up whether the references in
question exist in our repository.

There was on-list discussion about how we could do better than this
patch. Junio suggested to update parse_commits() to call a new
"gitweb--helper" command which would pass each of the revision
candidates through "rev-parse --verify --quiet". That would cut down
on our false positives (e.g. we'll link to "deadbeef"), and also allow
us to be more aggressive in selecting candidate revisions.

That may be too expensive to work in practice, or it may
not. Investigating that would be a good follow-up to this patch.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 gitweb/gitweb.perl | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 92b5e91..7cf68f0 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -2036,10 +2036,24 @@ sub format_log_line_html {
my $line = shift;
 
$line = esc_html($line, -nbsp=>1);
-   $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
+   $line =~ s{
+\b
+(
+# The output of "git describe", e.g. v2.10.0-297-gf6727b0
+# or hadoop-20160921-113441-20-g094fb7d
+(?a({-href => href(action=>"object", hash=>$1),
-class => "text"}, $1);
-   }eg;
+   }egx;
 
return $line;
 }
-- 
2.9.3



Re: [PATCH v4 2/2] gitweb: use highlight's shebang detection

2016-09-28 Thread Ian Kelling
On Sun, Sep 25, 2016, at 11:04 AM, Jakub Narębski wrote:
> 
> For what it is worth it:
> 
> Acked-by: Jakub Narębski 
> 
> (but unfortunately *not* tested by).

Thank you for all your help.
--
Ian Kelling


Re: [PATCH v4 2/2] gitweb: use highlight's shebang detection

2016-09-25 Thread Jakub Narębski
W dniu 25.09.2016 o 00:32, Ian Kelling pisze:
> The "highlight" binary can, in some cases, determine the language type
> by the means of file contents, for example the shebang in the first line
> for some scripting languages.  Make use of this autodetection for files
> which syntax is not known by gitweb.  In that case, pass the blob
> contents to "highlight --force"; the parameter is needed to make it
> always generate HTML output (which includes HTML-escaping).
> 
> Although we now run highlight on files which do not end up highlighted,
> performance is virtually unaffected because when we call highlight, it
> is used for escaping HTML.  In the case that highlight is used, gitweb
> calls sanitize() instead of esc_html(), and the latter is significantly
> slower (it does more, being roughly a superset of sanitize()).  Simple
> benchmark comparing performance of 'blob' view of files without syntax
> highlighting in gitweb before and after this change indicates ±1%
> difference in request time for all file types.  Benchmark was performed
> on local instance on Debian, using Apache/2.4.23 web server and CGI.
> 
> Document the feature and improve syntax highlight documentation, add
> test to ensure gitweb doesn't crash when language detection is used.
> 
> Signed-off-by: Ian Kelling 

For what it is worth it:

Acked-by: Jakub Narębski 

(but unfortunately *not* tested by).

> ---
> 
> Notes:
> The only change from v3 is the commit message as suggested by Jakub
> Narębski
> 
>  Documentation/gitweb.conf.txt  | 21 ++---
>  gitweb/gitweb.perl | 10 +-
>  t/t9500-gitweb-standalone-no-errors.sh |  8 
>  3 files changed, 27 insertions(+), 12 deletions(-)



Re: [PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter

2016-09-25 Thread Jakub Narębski
W dniu 25.09.2016 o 00:32, Ian Kelling pisze:

> Subject: gitweb: remove unused guess_file_syntax() parameter
>
> Signed-off-by: Ian Kelling 

Acked-by: Jakub Narębski 

> ---
> 
> Notes:
> The only change from v3 is a more descriptive commit message
> 
>  gitweb/gitweb.perl | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 33d701d..6cb4280 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3913,7 +3913,7 @@ sub blob_contenttype {
>  # guess file syntax for syntax highlighting; return undef if no highlighting
>  # the name of syntax can (in the future) depend on syntax highlighter used
>  sub guess_file_syntax {
> - my ($highlight, $mimetype, $file_name) = @_;
> + my ($highlight, $file_name) = @_;
>   return undef unless ($highlight && defined $file_name);
>   my $basename = basename($file_name, '.in');
>   return $highlight_basename{$basename}
> @@ -7062,7 +7062,7 @@ sub git_blob {
>   $have_blame &&= ($mimetype =~ m!^text/!);
>  
>   my $highlight = gitweb_check_feature('highlight');
> - my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> + my $syntax = guess_file_syntax($highlight, $file_name);
>   $fd = run_highlighter($fd, $highlight, $syntax)
>   if $syntax;
>  
> 



Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-24 Thread Ian Kelling
On Sat, Sep 24, 2016, at 09:21 AM, Jakub Narębski wrote:
> W dniu 24.09.2016 o 00:15, Jakub Narębski pisze:
> 
> Sidenote: this way of benchmarking of gitweb falls between two ways of
> doing a benchmark.
> 
> The first method is to simply run gitweb as a standalone script, passing
> its parameters in CGI environment variables; just like the test suite
> does it.  You would 'time' / 'times' it a few times, drop outliers, and
> take average or a median.  With this method you don't even need to set
> up a web server.
> 
> The second is to use a specialized program to benchmark the server-side
> of a web page, for example 'ab' (ApacheBench), httperf, curl-loader
> or JMeter.  The first one is usually distributed together with Apache
> web server, so you probably have it installed already.  Those tools
> provide timing statistics.

Good to know. Thanks.


Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-24 Thread Ian Kelling
On Fri, Sep 23, 2016, at 03:15 PM, Jakub Narębski wrote:
> W dniu 23.09.2016 o 11:08, Ian Kelling napisał:
>
> > The "highlight" binary can, in some cases, determine the language type
> > by the means of file contents, for example the shebang in the first line
> > for some scripting languages.  Make use of this autodetection for files
> > which syntax is not known by gitweb.  In that case, pass the blob
> > contents to "highlight --force"; the parameter is needed to make it
> > always generate HTML output (which includes HTML-escaping).
>
> Right.
>
> >
> > Although we now run highlight on files which do not end up highlighted,
> > performance is virtually unaffected because when we call highlight, we
> > also call sanitize() instead of esc_html(), which is significantly
> > slower.
>
> This paragraph is a bit unclear, for example it is not obvious what
> "..., which is significantly slower" refers to: sanitize() or esc_html().
>
> I think it would be better to write:
>
>   Although we now run highlight on files which do not end up highlighted,
>   performance is virtually unaffected because when we call highlight, it
>   is used for escaping HTML.  In the case that highlight is used, gitweb
>   calls sanitize() instead of esc_html(), and the latter is significantly
>   slower (it does more, being roughly a superset of sanitize()).

Agree. Done in v4.

>
> >After curling blob view of unhighlighted large and small text
> > files of perl code and license text 100 times each on a local
> > Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
> > request time for all file types.
>
> Also, "curling" is not the word I would like to see. I would say:
>
>   Simple benchmark comparing performance of 'blob' view of files without
>   syntax highlighting in gitweb before and after this change indicates
>   ±1% difference in request time for all file types.  Benchmark was
>   performed on local instance on Debian, using Apache/2.4.23 web server
>   and CGI/PSGI/FCGI/mod_perl.
>
>   ^^--- select one
>
> Or something like that; I'm not sure how detailed this should be.
> But it is nice to have such benchmark in the commit message.


Sounds  good. Used it in v4.

>
> Anyway I think that adding yet another configuration toggle for selecting
> whether to use "highlight" syntax autodetection or not would be just an
> unnecessary complication.
>
> Note that the performance loss might be quite higher on MS Windows, with
> its higher cost of fork.  But then they probably do not configure
> server-side highligher anyway.
>
> >
> > Document the feature and improve syntax highlight documentation, add
> > test to ensure gitweb doesn't crash when language detection is used.
>
> Good.
>
> >
> > Signed-off-by: Ian Kelling 
> > ---
> >  Documentation/gitweb.conf.txt  | 21 ++---
> >  gitweb/gitweb.perl | 10 +-
> >  t/t9500-gitweb-standalone-no-errors.sh |  8 
> >  3 files changed, 27 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
>
> We should probably say what does it mean to be "highlight"[1] compatible,
> but it is outside of scope for this patch, and I think also out of scope
> of this series.
>
> > Note that 'highlight' feature must be set for gitweb to actually
> > use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax ` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%

[PATCH v4 2/2] gitweb: use highlight's shebang detection

2016-09-24 Thread Ian Kelling
The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages.  Make use of this autodetection for files
which syntax is not known by gitweb.  In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).

Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, it
is used for escaping HTML.  In the case that highlight is used, gitweb
calls sanitize() instead of esc_html(), and the latter is significantly
slower (it does more, being roughly a superset of sanitize()).  Simple
benchmark comparing performance of 'blob' view of files without syntax
highlighting in gitweb before and after this change indicates ±1%
difference in request time for all file types.  Benchmark was performed
on local instance on Debian, using Apache/2.4.23 web server and CGI.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.

Signed-off-by: Ian Kelling 
---

Notes:
The only change from v3 is the commit message as suggested by Jakub
Narębski

 Documentation/gitweb.conf.txt      | 21 ++---
 gitweb/gitweb.perl     | 10 +-
 t/t9500-gitweb-standalone-no-errors.sh |  8 
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
Note that 'highlight' feature must be set for gitweb to actually
use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax ` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax `
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6cb4280..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
my ($fd, $highlight, $syntax) = @_;
-   return $fd unless ($highlight && defined $syntax);
+   return $fd unless ($highlight);
 
close $fd;
+   my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
  quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', 
'-pse',
'$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
'--', "-fe=$fallback_encoding")." | ".
  quote_command($highlight_bin).
- " --replace-tabs=8 --fragment --syntax $syntax |"
+ " --replace-tabs=8 --fragment $syntax_arg |"
or die_error(500, "Couldn't open file or run syntax 
highlighter");
return $fd;
 }
@@ -7063,8 +7064,7 @@ sub git_blob {
 
my $highlight = gitweb_check_feature('highlight');
my $syntax = guess_file_syntax($highlight, $file_name);
-   $fd =

[PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter

2016-09-24 Thread Ian Kelling
Signed-off-by: Ian Kelling 
---

Notes:
The only change from v3 is a more descriptive commit message

 gitweb/gitweb.perl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..6cb4280 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-   my ($highlight, $mimetype, $file_name) = @_;
+   my ($highlight, $file_name) = @_;
return undef unless ($highlight && defined $file_name);
my $basename = basename($file_name, '.in');
return $highlight_basename{$basename}
@@ -7062,7 +7062,7 @@ sub git_blob {
$have_blame &&= ($mimetype =~ m!^text/!);
 
my $highlight = gitweb_check_feature('highlight');
-   my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
+   my $syntax = guess_file_syntax($highlight, $file_name);
$fd = run_highlighter($fd, $highlight, $syntax)
if $syntax;
 
-- 
2.9.3



Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-24 Thread Junio C Hamano
Jakub Narębski  writes:

>> Also, "curling" is not the word I would like to see. I would say:
>> 
>>   Simple benchmark comparing performance of 'blob' view of files without
>>   syntax highlighting in gitweb before and after this change indicates
>>   ±1% difference in request time for all file types.  Benchmark was
>>   performed on local instance on Debian, using Apache/2.4.23 web server
>>   and CGI/PSGI/FCGI/mod_perl.
>> 
>>   ^^--- select one

or state that all of them produced similar results ;-)

>> Or something like that; I'm not sure how detailed this should be.
>> But it is nice to have such benchmark in the commit message.
>
> Sidenote: this way of benchmarking of gitweb falls between two ways of
> doing a benchmark.

All good comments.  Thanks.


Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-24 Thread Jakub Narębski
W dniu 24.09.2016 o 00:15, Jakub Narębski pisze:
> W dniu 23.09.2016 o 11:08, Ian Kelling napisał: 

>>After curling blob view of unhighlighted large and small text
>> files of perl code and license text 100 times each on a local
>> Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
>> request time for all file types.
> 
> Also, "curling" is not the word I would like to see. I would say:
> 
>   Simple benchmark comparing performance of 'blob' view of files without
>   syntax highlighting in gitweb before and after this change indicates
>   ±1% difference in request time for all file types.  Benchmark was
>   performed on local instance on Debian, using Apache/2.4.23 web server
>   and CGI/PSGI/FCGI/mod_perl.
> 
>   ^^--- select one
> 
> Or something like that; I'm not sure how detailed this should be.
> But it is nice to have such benchmark in the commit message.

Sidenote: this way of benchmarking of gitweb falls between two ways of
doing a benchmark.

The first method is to simply run gitweb as a standalone script, passing
its parameters in CGI environment variables; just like the test suite
does it.  You would 'time' / 'times' it a few times, drop outliers, and
take average or a median.  With this method you don't even need to set
up a web server.

The second is to use a specialized program to benchmark the server-side
of a web page, for example 'ab' (ApacheBench), httperf, curl-loader
or JMeter.  The first one is usually distributed together with Apache
web server, so you probably have it installed already.  Those tools
provide timing statistics.

[...]
> Note that the performance loss might be quite higher on MS Windows, with
> its higher cost of fork.  But then they probably do not configure
> server-side highligher anyway.



Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-23 Thread Jakub Narębski
W dniu 23.09.2016 o 11:08, Ian Kelling napisał:

> The "highlight" binary can, in some cases, determine the language type
> by the means of file contents, for example the shebang in the first line
> for some scripting languages.  Make use of this autodetection for files
> which syntax is not known by gitweb.  In that case, pass the blob
> contents to "highlight --force"; the parameter is needed to make it
> always generate HTML output (which includes HTML-escaping).

Right.

> 
> Although we now run highlight on files which do not end up highlighted,
> performance is virtually unaffected because when we call highlight, we
> also call sanitize() instead of esc_html(), which is significantly
> slower. 

This paragraph is a bit unclear, for example it is not obvious what
"..., which is significantly slower" refers to: sanitize() or esc_html().

I think it would be better to write:

  Although we now run highlight on files which do not end up highlighted,
  performance is virtually unaffected because when we call highlight, it
  is used for escaping HTML.  In the case that highlight is used, gitweb
  calls sanitize() instead of esc_html(), and the latter is significantly
  slower (it does more, being roughly a superset of sanitize()).

>After curling blob view of unhighlighted large and small text
> files of perl code and license text 100 times each on a local
> Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
> request time for all file types.

Also, "curling" is not the word I would like to see. I would say:

  Simple benchmark comparing performance of 'blob' view of files without
  syntax highlighting in gitweb before and after this change indicates
  ±1% difference in request time for all file types.  Benchmark was
  performed on local instance on Debian, using Apache/2.4.23 web server
  and CGI/PSGI/FCGI/mod_perl.

  ^^--- select one

Or something like that; I'm not sure how detailed this should be.
But it is nice to have such benchmark in the commit message.

Anyway I think that adding yet another configuration toggle for selecting
whether to use "highlight" syntax autodetection or not would be just an
unnecessary complication.

Note that the performance loss might be quite higher on MS Windows, with
its higher cost of fork.  But then they probably do not configure
server-side highligher anyway.

> 
> Document the feature and improve syntax highlight documentation, add
> test to ensure gitweb doesn't crash when language detection is used.

Good.

> 
> Signed-off-by: Ian Kelling 
> ---
>  Documentation/gitweb.conf.txt  | 21 ++---
>  gitweb/gitweb.perl | 10 +-
>  t/t9500-gitweb-standalone-no-errors.sh |  8 
>  3 files changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> index a79e350..e632089 100644
> --- a/Documentation/gitweb.conf.txt
> +++ b/Documentation/gitweb.conf.txt
> @@ -246,13 +246,20 @@ $highlight_bin::

We should probably say what does it mean to be "highlight"[1] compatible,
but it is outside of scope for this patch, and I think also out of scope
of this series.

>   Note that 'highlight' feature must be set for gitweb to actually
>   use syntax highlighting.
>  +
> -*NOTE*: if you want to add support for new file type (supported by
> -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> -or `%highlight_basename`, depending on whether you detect type of file
> -based on extension (for example "sh") or on its basename (for example
> -"Makefile").  The keys of these hashes are extension and basename,
> -respectively, and value for given key is name of syntax to be passed via
> -`--syntax ` to highlighter.
> +*NOTE*: for a file to be highlighted, its syntax type must be detected
> +and that syntax must be supported by "highlight".  The default syntax
> +detection is minimal, and there are many supported syntax types with no
> +detection by default.  There are three options for adding syntax
> +detection.  The first and second priority are `%highlight_basename` and
> +`%highlight_ext`, which detect based on basename (the full filename, for
> +example "Makefile") and extension (for example "sh").  The keys of these
> +hashes are the basename and extension, respectively, and the value for a
> +given key is the name of the syntax to be passed via `--syntax `
> +to "highlight".  The last priority is the "highlight" configuration of
> +`Shebang` regular expressions to detect the language based on the first
> +line in the file, (for example, matching the line

Re: [PATCH v3 1/2] gitweb: remove unused function parameter

2016-09-23 Thread Junio C Hamano
Jakub Narębski  writes:

> I think it would be better to be more descriptive, and say:
>
>   Subject: [PATCH v3 1/2] gitweb: remove unused parameter from 
> guess_file_syntax()
> Acked-by: Jakub Narębski 

Thanks.


Re: [PATCH v3 1/2] gitweb: remove unused function parameter

2016-09-23 Thread Jakub Narębski
W dniu 23.09.2016 o 11:08, Ian Kelling napisał:
>
> Subject: [PATCH v3 1/2] gitweb: remove unused function parameter

I think it would be better to be more descriptive, and say:

  Subject: [PATCH v3 1/2] gitweb: remove unused parameter from 
guess_file_syntax()

But that might be too long...

>
> Signed-off-by: Ian Kelling 

With, or without this change, it's nice.

Acked-by: Jakub Narębski 

> ---
>  gitweb/gitweb.perl | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 33d701d..6cb4280 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3913,7 +3913,7 @@ sub blob_contenttype {
>  # guess file syntax for syntax highlighting; return undef if no highlighting
>  # the name of syntax can (in the future) depend on syntax highlighter used
>  sub guess_file_syntax {
> - my ($highlight, $mimetype, $file_name) = @_;
> + my ($highlight, $file_name) = @_;
>   return undef unless ($highlight && defined $file_name);
>   my $basename = basename($file_name, '.in');
>   return $highlight_basename{$basename}
> @@ -7062,7 +7062,7 @@ sub git_blob {
>   $have_blame &&= ($mimetype =~ m!^text/!);
>  
>   my $highlight = gitweb_check_feature('highlight');
> - my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
> + my $syntax = guess_file_syntax($highlight, $file_name);
>   $fd = run_highlighter($fd, $highlight, $syntax)
>   if $syntax;
>  
> 



Re: [PATCH v2] gitweb: use highlight's shebang detection

2016-09-23 Thread Ian Kelling
On Thu, Sep 22, 2016, at 03:50 PM, Jakub Narębski wrote:
> W dniu 22.09.2016 o 00:18, Ian Kelling napisał:
>
> > The highlight binary can detect language by shebang when we can't tell
> > the syntax type by the name of the file. In that case, pass the blob
> > to "highlight --force" and the resulting html will have markup for
> > highlighting if the language was detected.
>
> This description feels a bit convoluted. Perhaps something like this:
>
>   The "highlight" binary can, in some cases, determine the language type
>   by the means of file contents, for example the shebang in the first
>   line
>   for some scripting languages.  Make use of this autodetection for files
>   which syntax is not known by gitweb.  In that case, pass the blob
>   contents to "highlight --force"; the parameter is needed to make it
>   always generate HTML output (which includes HTML-escaping).

Nice. Using it in v3.

>
> Also, we might want to have the information about performance of this
> solution either in the commit message, or in commit comments.

I tested it more rigorously and added to v3 commit message.

>
> >
> > Document the feature and improve syntax highlight documentation, add
> > test to ensure gitweb doesn't crash when language detection is used,
>
> All right.
>
> > and remove an unused parameter from gitweb_check_feature().
>
> First, that is guess_file_syntax(), not gitweb_check_feature().
> Second, this change could be made into independent patch, for example
> preparatory one.


Oops. I split it out in v3.

>
> >
> > Signed-off-by: Ian Kelling 
> > ---
> >  Documentation/gitweb.conf.txt  | 21 ++---
> >  gitweb/gitweb.perl | 14 +++---
> >  t/t9500-gitweb-standalone-no-errors.sh |  8 
> >  3 files changed, 29 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
> > Note that 'highlight' feature must be set for gitweb to actually
> > use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax ` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is the name of the syntax to be passed via `--syntax `
> > +to "highlight".  The last priority is the "highlight" configuration of
> > +`Shebang` regular expressions to detect the language based on the first
> > +line in the file, (for example, matching the line "#!/bin/bash").  See
> > +the highlight documentation and the default config at
> > +/etc/highlight/filetypes.conf for more details.
> >  +
>
> I think the rewrite is a bit more readable.
>
> >  For example if repositories you are hosting use "phtml" extension for
> >  PHP files, and you want to have correct syntax-highlighting for those
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 33d701d..44094f4 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -3913,7 +3913,7 @@ sub blob_contenttype {
> >  # guess file syntax for syntax highlighting; return undef if no 
> > highlighting
> >  # the name of syntax can (in the future) depend on syntax highlighter used
> >  sub guess_file_syntax {
> > -   my ($highlight, $mimetype, $file_name) = @_;
> > +   my ($highlight, $file_name) = @_;
>
> Right.
>

[PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-23 Thread Ian Kelling
The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages.  Make use of this autodetection for files
which syntax is not known by gitweb.  In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).

Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, we
also call sanitize() instead of esc_html(), which is significantly
slower. After curling blob view of unhighlighted large and small text
files of perl code and license text 100 times each on a local
Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
request time for all file types.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.

Signed-off-by: Ian Kelling 
---
 Documentation/gitweb.conf.txt  | 21 ++---
 gitweb/gitweb.perl | 10 +-
 t/t9500-gitweb-standalone-no-errors.sh |  8 
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
Note that 'highlight' feature must be set for gitweb to actually
use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax ` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax `
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6cb4280..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
my ($fd, $highlight, $syntax) = @_;
-   return $fd unless ($highlight && defined $syntax);
+   return $fd unless ($highlight);
 
close $fd;
+   my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
  quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', 
'-pse',
'$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
'--', "-fe=$fallback_encoding")." | ".
  quote_command($highlight_bin).
- " --replace-tabs=8 --fragment --syntax $syntax |"
+ " --replace-tabs=8 --fragment $syntax_arg |"
or die_error(500, "Couldn't open file or run syntax 
highlighter");
return $fd;
 }
@@ -7063,8 +7064,7 @@ sub git_blob {
 
my $highlight = gitweb_check_feature('highlight');
my $syntax = guess_file_syntax($highlight, $file_name);
-   $fd = run_highlighter($fd, $highlight, $syntax)
-   if $syntax;
+   $fd = run_highlighter($fd, $highlight, $syntax);
 
git_header_html(undef, $expires);
my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
 

[PATCH v3 1/2] gitweb: remove unused function parameter

2016-09-23 Thread Ian Kelling
Signed-off-by: Ian Kelling 
---
 gitweb/gitweb.perl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..6cb4280 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-   my ($highlight, $mimetype, $file_name) = @_;
+   my ($highlight, $file_name) = @_;
return undef unless ($highlight && defined $file_name);
my $basename = basename($file_name, '.in');
return $highlight_basename{$basename}
@@ -7062,7 +7062,7 @@ sub git_blob {
$have_blame &&= ($mimetype =~ m!^text/!);
 
my $highlight = gitweb_check_feature('highlight');
-   my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
+   my $syntax = guess_file_syntax($highlight, $file_name);
$fd = run_highlighter($fd, $highlight, $syntax)
if $syntax;
 
-- 
2.9.3



Re: [PATCH v2] gitweb: use highlight's shebang detection

2016-09-22 Thread Jakub Narębski
W dniu 22.09.2016 o 00:18, Ian Kelling napisał:

> The highlight binary can detect language by shebang when we can't tell
> the syntax type by the name of the file. In that case, pass the blob
> to "highlight --force" and the resulting html will have markup for
> highlighting if the language was detected.

This description feels a bit convoluted. Perhaps something like this:

  The "highlight" binary can, in some cases, determine the language type
  by the means of file contents, for example the shebang in the first line
  for some scripting languages.  Make use of this autodetection for files
  which syntax is not known by gitweb.  In that case, pass the blob
  contents to "highlight --force"; the parameter is needed to make it
  always generate HTML output (which includes HTML-escaping).

Also, we might want to have the information about performance of this
solution either in the commit message, or in commit comments.

> 
> Document the feature and improve syntax highlight documentation, add
> test to ensure gitweb doesn't crash when language detection is used,

All right.

> and remove an unused parameter from gitweb_check_feature().

First, that is guess_file_syntax(), not gitweb_check_feature().
Second, this change could be made into independent patch, for example
preparatory one.

> 
> Signed-off-by: Ian Kelling 
> ---
>  Documentation/gitweb.conf.txt  | 21 ++---
>  gitweb/gitweb.perl | 14 +++---
>  t/t9500-gitweb-standalone-no-errors.sh |  8 
>  3 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> index a79e350..e632089 100644
> --- a/Documentation/gitweb.conf.txt
> +++ b/Documentation/gitweb.conf.txt
> @@ -246,13 +246,20 @@ $highlight_bin::
>   Note that 'highlight' feature must be set for gitweb to actually
>   use syntax highlighting.
>  +
> -*NOTE*: if you want to add support for new file type (supported by
> -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> -or `%highlight_basename`, depending on whether you detect type of file
> -based on extension (for example "sh") or on its basename (for example
> -"Makefile").  The keys of these hashes are extension and basename,
> -respectively, and value for given key is name of syntax to be passed via
> -`--syntax ` to highlighter.
> +*NOTE*: for a file to be highlighted, its syntax type must be detected
> +and that syntax must be supported by "highlight".  The default syntax
> +detection is minimal, and there are many supported syntax types with no
> +detection by default.  There are three options for adding syntax
> +detection.  The first and second priority are `%highlight_basename` and
> +`%highlight_ext`, which detect based on basename (the full filename, for
> +example "Makefile") and extension (for example "sh").  The keys of these
> +hashes are the basename and extension, respectively, and the value for a
> +given key is the name of the syntax to be passed via `--syntax `
> +to "highlight".  The last priority is the "highlight" configuration of
> +`Shebang` regular expressions to detect the language based on the first
> +line in the file, (for example, matching the line "#!/bin/bash").  See
> +the highlight documentation and the default config at
> +/etc/highlight/filetypes.conf for more details.
>  +

I think the rewrite is a bit more readable.

>  For example if repositories you are hosting use "phtml" extension for
>  PHP files, and you want to have correct syntax-highlighting for those
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 33d701d..44094f4 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3913,7 +3913,7 @@ sub blob_contenttype {
>  # guess file syntax for syntax highlighting; return undef if no highlighting
>  # the name of syntax can (in the future) depend on syntax highlighter used
>  sub guess_file_syntax {
> - my ($highlight, $mimetype, $file_name) = @_;
> + my ($highlight, $file_name) = @_;

Right.

>   return undef unless ($highlight && defined $file_name);
>   my $basename = basename($file_name, '.in');
>   return $highlight_basename{$basename}
> @@ -3931,15 +3931,16 @@ sub guess_file_syntax {
>  # or return original FD if no highlighting
>  sub run_highlighter {
>   my ($fd, $highlight, $syntax) = @_;
> - return $fd unless ($highlight && defined $syntax);
> + return $fd unless ($highlight);

Run highlighter if it is defined, even if gitweb doesn't know syntax, right.

>  

Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
fyi: I mistakenly did not include v2 in the subject of the last message.


[PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
The highlight binary can detect language by shebang when we can't tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used,
and remove an unused parameter from gitweb_check_feature().

Signed-off-by: Ian Kelling 
---
 Documentation/gitweb.conf.txt  | 21 ++++++---
 gitweb/gitweb.perl | 14 +++---
 t/t9500-gitweb-standalone-no-errors.sh |  8 
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
Note that 'highlight' feature must be set for gitweb to actually
use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax ` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax `
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-   my ($highlight, $mimetype, $file_name) = @_;
+   my ($highlight, $file_name) = @_;
return undef unless ($highlight && defined $file_name);
my $basename = basename($file_name, '.in');
return $highlight_basename{$basename}
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
my ($fd, $highlight, $syntax) = @_;
-   return $fd unless ($highlight && defined $syntax);
+   return $fd unless ($highlight);
 
close $fd;
+   my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
  quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', 
'-pse',
'$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
'--', "-fe=$fallback_encoding")." | ".
  quote_command($highlight_bin).
- " --replace-tabs=8 --fragment --syntax $syntax |"
+ " --replace-tabs=8 --fragment $syntax_arg |"
or die_error(500, "Couldn't open file or run syntax 
highlighter");
return $fd;
 }
@@ -7062,9 +7063,8 @@ sub git_blob {
$have_blame &&= ($mimetype =~ m!^text/!);
 
my $highlight = gitweb_check_feature('highlight');
-   my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
-   $fd = run_highlighter($fd, $highlight, $syntax)
-   if $syntax;
+   my $syntax = guess_file_syntax($highlight, $file_name);
+   $fd = run_highlighter($fd, $highlight, $syntax);
 
git_header_html(undef, $expires);
my $formats_nav = '

Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
On Tue, Sep 20, 2016, at 01:22 PM, Jakub Narębski wrote:
> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>
> > The highlight binary can detect language by shebang when we can't tell
> > the syntax type by the name of the file.
>
> Was it something always present among highlight[1] binary capabilities,
> or is it something present only in new enough highlight app?  Or only
> in some specific fork / specific binary?  I couldn't find language
> detection in highlight[1] documentation...
>
> [1]: http://www.andre-simon.de/doku/highlight/en/highlight.php

Search for the word shebang, it's mentioned twice.

>
> If this feature is available only for some version, or for some
> highlighters, gitweb would have to provide an option to configure
> it.  It might be an additional configuration variable, it might
> be a special value in the %highlight_basename or %highlight_ext.

Good question. It was added upstream in 2007, and I tested that it's
functioning in the earliest distros I have easy access to: ubuntu 14.04
and debian wheezy.

>
> >  To use highlight's shebang
> > detection, add highlight to the pipeline whenever highlight is enabled.
>
> This describes what this patch does, but the sentence feels
> a bit convoluted, as it is stated.
>

Agreed. I've changed it in v2 of the patch, and perhaps this will make
the rest of the patch clearer too. The new paragraph is:

The highlight binary can detect language by shebang when we can't
tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.



> >
> > Document the shebang detection and add a test which exercises it in
> > t/t9500-gitweb-standalone-no-errors.sh.
>
> Nice!
>
> >
> > Signed-off-by: Ian Kelling 
> > ---
> >
> > Notes:
> > I wondered if adding highlight to the pipeline would make viewing a blob
> > with no highlighting take longer but it did not on my computer. I found
> > no noticeable impact on small files and strangely, on a 159k file, it
> > took 7% less time averaged over several requests.
>
> Strange.  I would guess that invoking separate binary and perl would
> always
> add to the time (especially on operation systems where forking / running
> command is expensive... though those are not often used with web servers,
> isn't it).

I dug into this a little more, and I think it's because when we call
highlight, we later call sanitize() instead of esc_html(). sanitize() is
faster and makes up for the extra time highlight takes. I ran a test on
my machine calling sanitize and esc_html on each line of gitweb.perl 100
times: 7.4s for sanitize, 12.4s for esc_html.

>
> >
> >  Documentation/gitweb.conf.txt  | 21 ++---
> >  gitweb/gitweb.perl | 10 +-
> >  t/t9500-gitweb-standalone-no-errors.sh | 18 +-
> >  3 files changed, 32 insertions(+), 17 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
> > Note that 'highlight' feature must be set for gitweb to actually
> > use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax ` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is t

Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

2016-09-21 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 21, 2016 at 8:28 PM, Jakub Narębski  wrote:
> W dniu 21.09.2016 o 20:04, Ævar Arnfjörð Bjarmason pisze:
>> It would make some code like git_print_log() a bit more complex /
>> fragile, since it would have to work on multi-line strings, but
>> anything that needed to do a regex match / replacement would be much
>> faster.
>
> Would it?  Did you perform any synthetic micro-benchmark?

No, just experience. With the caveat that this may not matter at all
in this context, C-like code in Perl is slow, if you can offload
things to one big regex operation it's usually faster.

>>
>> But OTOH I think perhaps we're worrying about nothing when it comes to
>> the performance. I haven't been able to make gitweb display more than
>> a 100 or so commits at a time (haven't found where exactly in the code
>> these limits are), any munging we do on the log messages would have to
>> be pretty damn slow to matter.
>
> sub git_log_generic {
>
> # [...]
>
> my @commitlist =
> parse_commits($commit_hash, 101, (100 * $page),
>   defined $file_name ? ($file_name, 
> "--full-history") : ());
>
> Here you have it (it probably should be a constant; this number can be
> found in a few other places).

Thanks!


Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 19:58, Ævar Arnfjörð Bjarmason pisze:
> On Wed, Sep 21, 2016 at 7:09 PM, Jakub Narębski  wrote:
>> W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:

>>> +(?>> +[A-Za-z0-9.-]+
>>> +(?!\.) # refs can't end with ".", see check_refname_format()
>>
>> If we can assume that tag name is at least two characters (instead of
>> at least one character), we could get rid of those extended regexp
>> lookaround assertions:
>>
>>   (?>   (?!pattern)  - zero-width negative lookahead  assertion
>>
>> That is:
>>
>>   +[A-Za-z0-9.]   # see strbuf_check_tag_ref(). Tags can't start 
>> with -
>>   +[A-Za-z0-9.-]*
>>   +[A-Za-z0-9-]   # refs can't end with ".", see 
>> check_refname_format()
> 
> Why get rid of them? I'm all for improving the regex, there's bound to
> be lots of bugs in it, but since it's perl we can freely use its
> extended features.

Ah, all right. I was wondering how zero width assertions / patterns
interact with each other, but zero-width negative lookaround assertions
are really quite simple.

> 
>> Also, the canonical documentation for what is allowed in refnames
>> is git-check-ref-format(1)... though it does not look like it includes
>> "tags cannot start with '-'".
> 
> Yeah, looks like that manpage needs to be patched.

Right.

> 
>> Anyway, perhaps 'is it valid refname' could be passed to a subroutine,
>> or a named regexp (which might be more involved, like disallowing two
>> consecutive dots, e.g. "(?!.*\.{2})" at beginning).

I wonder if rules for valid tag name can be described in extended
regexp, and if it is, how readable would it be.

Regards,
-- 
Jakub Narębski



Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 20:04, Ævar Arnfjörð Bjarmason pisze:
> On Wed, Sep 21, 2016 at 6:26 PM, Jakub Narębski  wrote:
> 
>> P.S. I have reworking of commit message parsing and enhancement in my
>> long, long and dated gitweb TODO list :-(
> 
> Anything specific you could share?

Some of TODO I would have to bring from backups, as the computer on
which I did majority of gitweb development has since died (from old
age).

The list includes:
- implement caching of gitweb output
- revamp handling of encoding (UTF-8 with fallback encoding)
- split gitweb into modules, while maintaining ease of install
- refactor handling of diffs
- better handling of config files
- document URI structure, perhaps revamp URI parsing and generation
- make commit message transformation generic
  (see below)

> 
> One thing that would be a lot faster in Perl is if we didn't have to
> pass the log around as split-up lines and could just operate on it as
> one big string.

Well, there are a few transformations that commit message undergoes
in gitweb, including linking SHA1, optional linking of bug numbers
to bug tracker, and syntax highlighting of signoff lines (trailer
lines).  

I would like to have this cleaned up, and refactored.  With all
those transformations we would need to keep account which parts
are HTML, and which not and need escaping (note: URI escape !=
HTML escape).

> 
> It would make some code like git_print_log() a bit more complex /
> fragile, since it would have to work on multi-line strings, but
> anything that needed to do a regex match / replacement would be much
> faster.

Would it?  Did you perform any synthetic micro-benchmark?

> 
> But OTOH I think perhaps we're worrying about nothing when it comes to
> the performance. I haven't been able to make gitweb display more than
> a 100 or so commits at a time (haven't found where exactly in the code
> these limits are), any munging we do on the log messages would have to
> be pretty damn slow to matter.

sub git_log_generic {

# [...]

my @commitlist =
parse_commits($commit_hash, 101, (100 * $page),
  defined $file_name ? ($file_name, 
"--full-history") : ());

Here you have it (it probably should be a constant; this number can be
found in a few other places).

Best,
-- 
Jakub Narębski



Re: [PATCH 2/3] gitweb: Link to 7-character SHA1SUMS in commit messages

2016-09-21 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 21, 2016 at 6:26 PM, Jakub Narębski  wrote:

> P.S. I have reworking of commit message parsing and enhancement in my
> long, long and dated gitweb TODO list :-(

Anything specific you could share?

One thing that would be a lot faster in Perl is if we didn't have to
pass the log around as split-up lines and could just operate on it as
one big string.

It would make some code like git_print_log() a bit more complex /
fragile, since it would have to work on multi-line strings, but
anything that needed to do a regex match / replacement would be much
faster.

But OTOH I think perhaps we're worrying about nothing when it comes to
the performance. I haven't been able to make gitweb display more than
a 100 or so commits at a time (haven't found where exactly in the code
these limits are), any munging we do on the log messages would have to
be pretty damn slow to matter.

> P.P.S. Kay Sievers no longer works on gitweb, and I think no longer
> works at SuSE but at RedHat.

Yup, been getting bounces from his address.


Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Junio C Hamano
Jakub Narębski  writes:

>> When I saw 2/3 I wondered about one thing and 3/3 shares the same,
>> which is that we only use regex match and do not validate for a
>> false match.  Would it be too expensive...
>
> It's a matter of balance between false positives (and unresolving
> links) and performance...

Yes, and that is why I asked a simple yes-or-no question.  Would it
be too expensive?  Your answer seems to be yes.

Have we measured?  Is that really a bottleneck?  Would it help to
update parse_commits to call a new command "gitweb--helper" that
produces the result of what git_print_log would have done to its
$log argument, for example?



Re: [PATCH 3/3] gitweb: Link to "git describe"'d commits in log messages

2016-09-21 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 21, 2016 at 7:09 PM, Jakub Narębski  wrote:
> W dniu 21.09.2016 o 13:44, Ævar Arnfjörð Bjarmason napisał:
>
>> Change the log formatting function to know about "git describe" output
>> like v2.8.0-4-g867ad08 in addition to just plain 867ad08.
>
> All right, that is a good plan.
>
>>
>> This also fixes a micro-regression in my change of the minimum SHA1
>> length from 8 to 7, which is that dated tags like
>> hadoop-20160921-113441-20-g094fb7d would start thinking the "20160921"
>> part was a commit.
>
> Actually 20160921 is 8 characters, so assuming that '-' is treated
> as word boundary by Perl, it is not a regression; this false positive
> was there.  The new feature would help, instead of linking false match
> it links whole git-describe output.
>
> So this paragraph needs to be changed wrt. the above.

Yeah I just miscounted there, will change that.

> Note that there are quite a bit of shortened SHA-1 that are composed
> entirely from digits, without a-f characters.

*nod*

>>
>> There are still many valid refnames that we don't link to
>> e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
>> v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
>> similarly it's trivially possible to create some refnames like
>> "æ/var-gf6727b0" or whatever which won't be picked up by this regex.
>
> Hopefully hierarchical tags are rare.  We need to reduce false
> positives.
>
>>
>> There's surely room for improvement here, but I just wanted to address
>> the very common case of sticking "git describe" output into commit
>> messages without trying to link to all possible refnames, that's going
>> to be a rather futile exercise given that this is free text, and it
>> would be prohibitively expensive to look up whether the references in
>> question exist in our repository.
>
> Note that we do not ask Git at the time of displaying commit message
> if the link is valid for performance reasons; we link it, and the link
> may be invalid if it was a false positive.
>
> Note that recommended way to refer to other commit in commit mesages
> is (see Documentation/SubmittingPatches):
>
>   If you want to reference a previous commit in the history of a stable
>   branch, use the format "abbreviated sha1 (subject, date)",
>   with the subject enclosed in a pair of double-quotes, like this:
>
>  Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30)
>  noticed that ...
>
> Hmmm... this makes previous commit even more important.
>
>> ---
>>  gitweb/gitweb.perl | 18 --
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 101dbc0..3a52bc7 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -2036,10 +2036,24 @@ sub format_log_line_html {
>>   my $line = shift;
>>
>>   $line = esc_html($line, -nbsp=>1);
>> - $line =~ s{\b([0-9a-fA-F]{7,40})\b}{
>> + $line =~ s{
>> +\b
>> +(
>> +# The output of "git describe", e.g. v2.10.0-297-gf6727b0
>> +# or hadoop-20160921-113441-20-g094fb7d
>
> All right, for more complex regular expressions using in-line comments
> (extended regexp in Perl) is a good idea.
>
>> +(?> +[A-Za-z0-9.-]+
>> +(?!\.) # refs can't end with ".", see check_refname_format()
>
> If we can assume that tag name is at least two characters (instead of
> at least one character), we could get rid of those extended regexp
> lookaround assertions:
>
>   (?   (?!pattern)  - zero-width negative lookahead  assertion
>
> That is:
>
>   +[A-Za-z0-9.]   # see strbuf_check_tag_ref(). Tags can't start 
> with -
>   +[A-Za-z0-9.-]*
>   +[A-Za-z0-9-]   # refs can't end with ".", see 
> check_refname_format()

Why get rid of them? I'm all for improving the regex, there's bound to
be lots of bugs in it, but since it's perl we can freely use its
extended features.

> Also, the canonical documentation for what is allowed in refnames
> is git-check-ref-format(1)... though it does not look like it includes
> "tags cannot start with '-'".

Yeah, looks like that manpage needs to be patched.

> Anyway, perhaps 'is it valid refname' could be passed to a subroutine,
> or a named regexp (which might be more involved, like disallowing two
> consecutive dots, e.g. "(?!.*\.{2})" at 

Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Jakub Narębski
W dniu 21.09.2016 o 18:38, Junio C Hamano pisze:
> Jakub Narębski  writes:
>> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>>
>>> The highlight binary can detect language by shebang when we can't tell
>>> the syntax type by the name of the file. 
>>
>> Was it something always present among highlight[1] binary capabilities,
>> or is it something present only in new enough highlight app?  Or only
>> in some specific fork / specific binary?  I couldn't find language
>> detection in highlight[1] documentation...
>> ...
>> Thank you for your work on this patch,
> 
> Thanks for reviewing.  It seems that there will be further exchange
> needed before I can pick it up?

Yes, I think so.

-- 
Jakub Narębski



  1   2   3   4   5   6   >