Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-11 Thread Ian Kelling


Christian Schoenebeck  writes:

> 4. MTA's should also address this DKIM issue more accurately.

I agree that Exim should be changed as you suggest.

>
> By taking these things into account, emails of domains with strict DMARC 
> policies are no longer munged on gnu lists.

Additional info: Migration of many gnu/nongnu.gnu.org lists is still in
progress for another week or so, then that will be true for most of
them. For a minority of lists, the list administrators have set weird
settings like making all messages have from: rewritten as from this
list, and we are leaving them as is since the list administrators opted
in to that at some point. But if the list deals with patches and not
modifying the headers is useful to the people on the list, I think a
request to change the list settings is likely to be accepted by the list
admin.

-- 
Ian Kelling | Senior Systems Administrator, Free Software Foundation
GPG Key: B125 F60B 7B28 7FF6 A2B7  DF8F 170A F0E2 9542 95DF
https://fsf.org | https://gnu.org


Re: How to watch a mailing list & repo for patches which affect a certain area of code?

2016-10-10 Thread Ian Kelling
On Mon, Oct 10, 2016, at 12:08 PM, Stefan Beller wrote:
> Well it is found in 2.9 and later. Currently the base footer is
> opt-in, e.g. you'd
> need to convince people to run `git config format.useAutoBase true` or to
> manually add the base to the patch via `format-patch --base=`.

Nice. Another useful config option this lead me to find is git config
--global branch.autoSetupMerge always which sets up the remote for local
branches, allowing useAutoBase to work for them without extra typing
(according to the man page, I haven't tried it yet).



How to watch a mailing list & repo for patches which affect a certain area of code?

2016-10-09 Thread Ian Kelling
I've got patches in various projects, and I don't have time to keep up
with the mailing list, but I'd like to help out with maintenance of that
code, or the functions/files it touches. People don't cc me. I figure I
could filter the list, test patches submitted, commits made, mentions of
files/functions, build filters based on the code I have in the repo even
if it's been moved or changed subsequently. I'm wondering what other
people have implemented already for automation around this, or general
thoughts. Web search is not showing me much.


Re: [PATCH v4 2/2] gitweb: use highlight's shebang detection

2016-09-28 Thread Ian Kelling
On Sun, Sep 25, 2016, at 11:04 AM, Jakub Narębski wrote:
> 
> For what it is worth it:
> 
> Acked-by: Jakub Narębski 
> 
> (but unfortunately *not* tested by).

Thank you for all your help.
--
Ian Kelling


Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-24 Thread Ian Kelling
On Sat, Sep 24, 2016, at 09:21 AM, Jakub Narębski wrote:
> W dniu 24.09.2016 o 00:15, Jakub Narębski pisze:
> 
> Sidenote: this way of benchmarking of gitweb falls between two ways of
> doing a benchmark.
> 
> The first method is to simply run gitweb as a standalone script, passing
> its parameters in CGI environment variables; just like the test suite
> does it.  You would 'time' / 'times' it a few times, drop outliers, and
> take average or a median.  With this method you don't even need to set
> up a web server.
> 
> The second is to use a specialized program to benchmark the server-side
> of a web page, for example 'ab' (ApacheBench), httperf, curl-loader
> or JMeter.  The first one is usually distributed together with Apache
> web server, so you probably have it installed already.  Those tools
> provide timing statistics.

Good to know. Thanks.


Re: [PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-24 Thread Ian Kelling
On Fri, Sep 23, 2016, at 03:15 PM, Jakub Narębski wrote:
> W dniu 23.09.2016 o 11:08, Ian Kelling napisał:
>
> > The "highlight" binary can, in some cases, determine the language type
> > by the means of file contents, for example the shebang in the first line
> > for some scripting languages.  Make use of this autodetection for files
> > which syntax is not known by gitweb.  In that case, pass the blob
> > contents to "highlight --force"; the parameter is needed to make it
> > always generate HTML output (which includes HTML-escaping).
>
> Right.
>
> >
> > Although we now run highlight on files which do not end up highlighted,
> > performance is virtually unaffected because when we call highlight, we
> > also call sanitize() instead of esc_html(), which is significantly
> > slower.
>
> This paragraph is a bit unclear, for example it is not obvious what
> "..., which is significantly slower" refers to: sanitize() or esc_html().
>
> I think it would be better to write:
>
>   Although we now run highlight on files which do not end up highlighted,
>   performance is virtually unaffected because when we call highlight, it
>   is used for escaping HTML.  In the case that highlight is used, gitweb
>   calls sanitize() instead of esc_html(), and the latter is significantly
>   slower (it does more, being roughly a superset of sanitize()).

Agree. Done in v4.

>
> >After curling blob view of unhighlighted large and small text
> > files of perl code and license text 100 times each on a local
> > Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
> > request time for all file types.
>
> Also, "curling" is not the word I would like to see. I would say:
>
>   Simple benchmark comparing performance of 'blob' view of files without
>   syntax highlighting in gitweb before and after this change indicates
>   ±1% difference in request time for all file types.  Benchmark was
>   performed on local instance on Debian, using Apache/2.4.23 web server
>   and CGI/PSGI/FCGI/mod_perl.
>
>   ^^--- select one
>
> Or something like that; I'm not sure how detailed this should be.
> But it is nice to have such benchmark in the commit message.


Sounds  good. Used it in v4.

>
> Anyway I think that adding yet another configuration toggle for selecting
> whether to use "highlight" syntax autodetection or not would be just an
> unnecessary complication.
>
> Note that the performance loss might be quite higher on MS Windows, with
> its higher cost of fork.  But then they probably do not configure
> server-side highligher anyway.
>
> >
> > Document the feature and improve syntax highlight documentation, add
> > test to ensure gitweb doesn't crash when language detection is used.
>
> Good.
>
> >
> > Signed-off-by: Ian Kelling 
> > ---
> >  Documentation/gitweb.conf.txt  | 21 ++---
> >  gitweb/gitweb.perl | 10 +-
> >  t/t9500-gitweb-standalone-no-errors.sh |  8 
> >  3 files changed, 27 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
>
> We should probably say what does it mean to be "highlight"[1] compatible,
> but it is outside of scope for this patch, and I think also out of scope
> of this series.
>
> > Note that 'highlight' feature must be set for gitweb to actually
> > use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax ` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%

[PATCH v4 2/2] gitweb: use highlight's shebang detection

2016-09-24 Thread Ian Kelling
The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages.  Make use of this autodetection for files
which syntax is not known by gitweb.  In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).

Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, it
is used for escaping HTML.  In the case that highlight is used, gitweb
calls sanitize() instead of esc_html(), and the latter is significantly
slower (it does more, being roughly a superset of sanitize()).  Simple
benchmark comparing performance of 'blob' view of files without syntax
highlighting in gitweb before and after this change indicates ±1%
difference in request time for all file types.  Benchmark was performed
on local instance on Debian, using Apache/2.4.23 web server and CGI.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.

Signed-off-by: Ian Kelling 
---

Notes:
The only change from v3 is the commit message as suggested by Jakub
Narębski

 Documentation/gitweb.conf.txt  | 21 ++---
 gitweb/gitweb.perl | 10 +-
 t/t9500-gitweb-standalone-no-errors.sh |  8 
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
Note that 'highlight' feature must be set for gitweb to actually
use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax ` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax `
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6cb4280..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
my ($fd, $highlight, $syntax) = @_;
-   return $fd unless ($highlight && defined $syntax);
+   return $fd unless ($highlight);
 
close $fd;
+   my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
  quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', 
'-pse',
'$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
'--', "-fe=$fallback_encoding")." | ".
  quote_command($highlight_bin).
- " --replace-tabs=8 --fragment --syntax $syntax |"
+ " --replace-tabs=8 --fragment $syntax_arg |"
or die_error(500, "Couldn't open file or run syntax 
highlighter");
return $fd;
 }
@@ -7063,8 +7064,7 @@ sub git_blob {
 
my $highlight = gitweb_check_feature('highlight');
my $syntax = guess_file_syntax($highlight, $file_name);
-   $fd =

[PATCH v4 1/2] gitweb: remove unused guess_file_syntax() parameter

2016-09-24 Thread Ian Kelling
Signed-off-by: Ian Kelling 
---

Notes:
The only change from v3 is a more descriptive commit message

 gitweb/gitweb.perl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..6cb4280 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-   my ($highlight, $mimetype, $file_name) = @_;
+   my ($highlight, $file_name) = @_;
return undef unless ($highlight && defined $file_name);
my $basename = basename($file_name, '.in');
return $highlight_basename{$basename}
@@ -7062,7 +7062,7 @@ sub git_blob {
$have_blame &&= ($mimetype =~ m!^text/!);
 
my $highlight = gitweb_check_feature('highlight');
-   my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
+   my $syntax = guess_file_syntax($highlight, $file_name);
$fd = run_highlighter($fd, $highlight, $syntax)
if $syntax;
 
-- 
2.9.3



Re: [PATCH v2] gitweb: use highlight's shebang detection

2016-09-23 Thread Ian Kelling
On Thu, Sep 22, 2016, at 03:50 PM, Jakub Narębski wrote:
> W dniu 22.09.2016 o 00:18, Ian Kelling napisał:
>
> > The highlight binary can detect language by shebang when we can't tell
> > the syntax type by the name of the file. In that case, pass the blob
> > to "highlight --force" and the resulting html will have markup for
> > highlighting if the language was detected.
>
> This description feels a bit convoluted. Perhaps something like this:
>
>   The "highlight" binary can, in some cases, determine the language type
>   by the means of file contents, for example the shebang in the first
>   line
>   for some scripting languages.  Make use of this autodetection for files
>   which syntax is not known by gitweb.  In that case, pass the blob
>   contents to "highlight --force"; the parameter is needed to make it
>   always generate HTML output (which includes HTML-escaping).

Nice. Using it in v3.

>
> Also, we might want to have the information about performance of this
> solution either in the commit message, or in commit comments.

I tested it more rigorously and added to v3 commit message.

>
> >
> > Document the feature and improve syntax highlight documentation, add
> > test to ensure gitweb doesn't crash when language detection is used,
>
> All right.
>
> > and remove an unused parameter from gitweb_check_feature().
>
> First, that is guess_file_syntax(), not gitweb_check_feature().
> Second, this change could be made into independent patch, for example
> preparatory one.


Oops. I split it out in v3.

>
> >
> > Signed-off-by: Ian Kelling 
> > ---
> >  Documentation/gitweb.conf.txt  | 21 ++---
> >  gitweb/gitweb.perl | 14 +++---
> >  t/t9500-gitweb-standalone-no-errors.sh |  8 
> >  3 files changed, 29 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
> > Note that 'highlight' feature must be set for gitweb to actually
> > use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax ` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is the name of the syntax to be passed via `--syntax `
> > +to "highlight".  The last priority is the "highlight" configuration of
> > +`Shebang` regular expressions to detect the language based on the first
> > +line in the file, (for example, matching the line "#!/bin/bash").  See
> > +the highlight documentation and the default config at
> > +/etc/highlight/filetypes.conf for more details.
> >  +
>
> I think the rewrite is a bit more readable.
>
> >  For example if repositories you are hosting use "phtml" extension for
> >  PHP files, and you want to have correct syntax-highlighting for those
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 33d701d..44094f4 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -3913,7 +3913,7 @@ sub blob_contenttype {
> >  # guess file syntax for syntax highlighting; return undef if no 
> > highlighting
> >  # the name of syntax can (in the future) depend on syntax highlighter used
> >  sub guess_file_syntax {
> > -   my ($highlight, $mimetype, $file_name) = @_;
> > +   my ($highlight, $file_name) = @_;
>
> Right.
>

[PATCH v3 2/2] gitweb: use highlight's shebang detection

2016-09-23 Thread Ian Kelling
The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages.  Make use of this autodetection for files
which syntax is not known by gitweb.  In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).

Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, we
also call sanitize() instead of esc_html(), which is significantly
slower. After curling blob view of unhighlighted large and small text
files of perl code and license text 100 times each on a local
Apache/2.4.23 (Debian) instance, it's logs indicate +-1% difference in
request time for all file types.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.

Signed-off-by: Ian Kelling 
---
 Documentation/gitweb.conf.txt  | 21 ++---
 gitweb/gitweb.perl | 10 +-
 t/t9500-gitweb-standalone-no-errors.sh |  8 
 3 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
Note that 'highlight' feature must be set for gitweb to actually
use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax ` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax `
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6cb4280..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
my ($fd, $highlight, $syntax) = @_;
-   return $fd unless ($highlight && defined $syntax);
+   return $fd unless ($highlight);
 
close $fd;
+   my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
  quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', 
'-pse',
'$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
'--', "-fe=$fallback_encoding")." | ".
  quote_command($highlight_bin).
- " --replace-tabs=8 --fragment --syntax $syntax |"
+ " --replace-tabs=8 --fragment $syntax_arg |"
or die_error(500, "Couldn't open file or run syntax 
highlighter");
return $fd;
 }
@@ -7063,8 +7064,7 @@ sub git_blob {
 
my $highlight = gitweb_check_feature('highlight');
my $syntax = guess_file_syntax($highlight, $file_name);
-   $fd = run_highlighter($fd, $highlight, $syntax)
-   if $syntax;
+   $fd = run_highlighter($fd, $highlight, $syntax);
 
git_header_html(undef, $expires);
my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
 

[PATCH v3 1/2] gitweb: remove unused function parameter

2016-09-23 Thread Ian Kelling
Signed-off-by: Ian Kelling 
---
 gitweb/gitweb.perl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..6cb4280 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-   my ($highlight, $mimetype, $file_name) = @_;
+   my ($highlight, $file_name) = @_;
return undef unless ($highlight && defined $file_name);
my $basename = basename($file_name, '.in');
return $highlight_basename{$basename}
@@ -7062,7 +7062,7 @@ sub git_blob {
$have_blame &&= ($mimetype =~ m!^text/!);
 
my $highlight = gitweb_check_feature('highlight');
-   my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
+   my $syntax = guess_file_syntax($highlight, $file_name);
$fd = run_highlighter($fd, $highlight, $syntax)
if $syntax;
 
-- 
2.9.3



Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
fyi: I mistakenly did not include v2 in the subject of the last message.


[PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
The highlight binary can detect language by shebang when we can't tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used,
and remove an unused parameter from gitweb_check_feature().

Signed-off-by: Ian Kelling 
---
 Documentation/gitweb.conf.txt  | 21 ++---
 gitweb/gitweb.perl | 14 +++---
 t/t9500-gitweb-standalone-no-errors.sh |  8 
 3 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
Note that 'highlight' feature must be set for gitweb to actually
use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax ` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax `
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..44094f4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3913,7 +3913,7 @@ sub blob_contenttype {
 # guess file syntax for syntax highlighting; return undef if no highlighting
 # the name of syntax can (in the future) depend on syntax highlighter used
 sub guess_file_syntax {
-   my ($highlight, $mimetype, $file_name) = @_;
+   my ($highlight, $file_name) = @_;
return undef unless ($highlight && defined $file_name);
my $basename = basename($file_name, '.in');
return $highlight_basename{$basename}
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
my ($fd, $highlight, $syntax) = @_;
-   return $fd unless ($highlight && defined $syntax);
+   return $fd unless ($highlight);
 
close $fd;
+   my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
  quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', 
'-pse',
'$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
'--', "-fe=$fallback_encoding")." | ".
  quote_command($highlight_bin).
- " --replace-tabs=8 --fragment --syntax $syntax |"
+ " --replace-tabs=8 --fragment $syntax_arg |"
or die_error(500, "Couldn't open file or run syntax 
highlighter");
return $fd;
 }
@@ -7062,9 +7063,8 @@ sub git_blob {
$have_blame &&= ($mimetype =~ m!^text/!);
 
my $highlight = gitweb_check_feature('highlight');
-   my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
-   $fd = run_highlighter($fd, $highlight, $syntax)
-   if $syntax;
+   my $syntax = guess_file_syntax($highlight, $file_name);
+   $fd = run_highlighter($fd, $highlight, $syntax);
 
git_header_html(undef, $expires);
my $formats_nav = '

Re: [PATCH] gitweb: use highlight's shebang detection

2016-09-21 Thread Ian Kelling
On Tue, Sep 20, 2016, at 01:22 PM, Jakub Narębski wrote:
> W dniu 06.09.2016 o 21:00, Ian Kelling pisze:
>
> > The highlight binary can detect language by shebang when we can't tell
> > the syntax type by the name of the file.
>
> Was it something always present among highlight[1] binary capabilities,
> or is it something present only in new enough highlight app?  Or only
> in some specific fork / specific binary?  I couldn't find language
> detection in highlight[1] documentation...
>
> [1]: http://www.andre-simon.de/doku/highlight/en/highlight.php

Search for the word shebang, it's mentioned twice.

>
> If this feature is available only for some version, or for some
> highlighters, gitweb would have to provide an option to configure
> it.  It might be an additional configuration variable, it might
> be a special value in the %highlight_basename or %highlight_ext.

Good question. It was added upstream in 2007, and I tested that it's
functioning in the earliest distros I have easy access to: ubuntu 14.04
and debian wheezy.

>
> >  To use highlight's shebang
> > detection, add highlight to the pipeline whenever highlight is enabled.
>
> This describes what this patch does, but the sentence feels
> a bit convoluted, as it is stated.
>

Agreed. I've changed it in v2 of the patch, and perhaps this will make
the rest of the patch clearer too. The new paragraph is:

The highlight binary can detect language by shebang when we can't
tell
the syntax type by the name of the file. In that case, pass the blob
to "highlight --force" and the resulting html will have markup for
highlighting if the language was detected.



> >
> > Document the shebang detection and add a test which exercises it in
> > t/t9500-gitweb-standalone-no-errors.sh.
>
> Nice!
>
> >
> > Signed-off-by: Ian Kelling 
> > ---
> >
> > Notes:
> > I wondered if adding highlight to the pipeline would make viewing a blob
> > with no highlighting take longer but it did not on my computer. I found
> > no noticeable impact on small files and strangely, on a 159k file, it
> > took 7% less time averaged over several requests.
>
> Strange.  I would guess that invoking separate binary and perl would
> always
> add to the time (especially on operation systems where forking / running
> command is expensive... though those are not often used with web servers,
> isn't it).

I dug into this a little more, and I think it's because when we call
highlight, we later call sanitize() instead of esc_html(). sanitize() is
faster and makes up for the extra time highlight takes. I ran a test on
my machine calling sanitize and esc_html on each line of gitweb.perl 100
times: 7.4s for sanitize, 12.4s for esc_html.

>
> >
> >  Documentation/gitweb.conf.txt  | 21 ++---
> >  gitweb/gitweb.perl | 10 +-
> >  t/t9500-gitweb-standalone-no-errors.sh | 18 +-
> >  3 files changed, 32 insertions(+), 17 deletions(-)
> >
> > diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
> > index a79e350..e632089 100644
> > --- a/Documentation/gitweb.conf.txt
> > +++ b/Documentation/gitweb.conf.txt
> > @@ -246,13 +246,20 @@ $highlight_bin::
> > Note that 'highlight' feature must be set for gitweb to actually
> > use syntax highlighting.
> >  +
> > -*NOTE*: if you want to add support for new file type (supported by
> > -"highlight" but not used by gitweb), you need to modify `%highlight_ext`
> > -or `%highlight_basename`, depending on whether you detect type of file
> > -based on extension (for example "sh") or on its basename (for example
> > -"Makefile").  The keys of these hashes are extension and basename,
> > -respectively, and value for given key is name of syntax to be passed via
> > -`--syntax ` to highlighter.
> > +*NOTE*: for a file to be highlighted, its syntax type must be detected
> > +and that syntax must be supported by "highlight".  The default syntax
> > +detection is minimal, and there are many supported syntax types with no
> > +detection by default.  There are three options for adding syntax
> > +detection.  The first and second priority are `%highlight_basename` and
> > +`%highlight_ext`, which detect based on basename (the full filename, for
> > +example "Makefile") and extension (for example "sh").  The keys of these
> > +hashes are the basename and extension, respectively, and the value for a
> > +given key is t

[PATCH] gitweb: use highlight's shebang detection

2016-09-06 Thread Ian Kelling
The highlight binary can detect language by shebang when we can't tell
the syntax type by the name of the file. To use highlight's shebang
detection, add highlight to the pipeline whenever highlight is enabled.

Document the shebang detection and add a test which exercises it in
t/t9500-gitweb-standalone-no-errors.sh.

Signed-off-by: Ian Kelling 
---

Notes:
I wondered if adding highlight to the pipeline would make viewing a blob
with no highlighting take longer but it did not on my computer. I found
no noticeable impact on small files and strangely, on a 159k file, it
took 7% less time averaged over several requests.

 Documentation/gitweb.conf.txt  | 21 ++---
 gitweb/gitweb.perl | 10 +-
 t/t9500-gitweb-standalone-no-errors.sh | 18 +-
 3 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
index a79e350..e632089 100644
--- a/Documentation/gitweb.conf.txt
+++ b/Documentation/gitweb.conf.txt
@@ -246,13 +246,20 @@ $highlight_bin::
Note that 'highlight' feature must be set for gitweb to actually
use syntax highlighting.
 +
-*NOTE*: if you want to add support for new file type (supported by
-"highlight" but not used by gitweb), you need to modify `%highlight_ext`
-or `%highlight_basename`, depending on whether you detect type of file
-based on extension (for example "sh") or on its basename (for example
-"Makefile").  The keys of these hashes are extension and basename,
-respectively, and value for given key is name of syntax to be passed via
-`--syntax ` to highlighter.
+*NOTE*: for a file to be highlighted, its syntax type must be detected
+and that syntax must be supported by "highlight".  The default syntax
+detection is minimal, and there are many supported syntax types with no
+detection by default.  There are three options for adding syntax
+detection.  The first and second priority are `%highlight_basename` and
+`%highlight_ext`, which detect based on basename (the full filename, for
+example "Makefile") and extension (for example "sh").  The keys of these
+hashes are the basename and extension, respectively, and the value for a
+given key is the name of the syntax to be passed via `--syntax `
+to "highlight".  The last priority is the "highlight" configuration of
+`Shebang` regular expressions to detect the language based on the first
+line in the file, (for example, matching the line "#!/bin/bash").  See
+the highlight documentation and the default config at
+/etc/highlight/filetypes.conf for more details.
 +
 For example if repositories you are hosting use "phtml" extension for
 PHP files, and you want to have correct syntax-highlighting for those
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 33d701d..a672181 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3931,15 +3931,16 @@ sub guess_file_syntax {
 # or return original FD if no highlighting
 sub run_highlighter {
my ($fd, $highlight, $syntax) = @_;
-   return $fd unless ($highlight && defined $syntax);
+   return $fd unless ($highlight);
 
close $fd;
+   my $syntax_arg = (defined $syntax) ? "--syntax $syntax" : "--force";
open $fd, quote_command(git_cmd(), "cat-file", "blob", $hash)." | ".
  quote_command($^X, '-CO', '-MEncode=decode,FB_DEFAULT', 
'-pse',
'$_ = decode($fe, $_, FB_DEFAULT) if !utf8::decode($_);',
'--', "-fe=$fallback_encoding")." | ".
  quote_command($highlight_bin).
- " --replace-tabs=8 --fragment --syntax $syntax |"
+ " --replace-tabs=8 --fragment $syntax_arg |"
or die_error(500, "Couldn't open file or run syntax 
highlighter");
return $fd;
 }
@@ -7063,8 +7064,7 @@ sub git_blob {
 
my $highlight = gitweb_check_feature('highlight');
my $syntax = guess_file_syntax($highlight, $mimetype, $file_name);
-   $fd = run_highlighter($fd, $highlight, $syntax)
-   if $syntax;
+   $fd = run_highlighter($fd, $highlight, $syntax);
 
git_header_html(undef, $expires);
my $formats_nav = '';
@@ -7117,7 +7117,7 @@ sub git_blob {
$line = untabify($line);
printf qq!%4i %s\n!,
   $nr, esc_attr(href(-replay => 1)), $nr, $nr,
-  $syntax ? sanitize($line) : esc_html($line, 
-nbsp=>1);
+  $highlight ? sanitize($line) : esc_html($line, 
-nbsp=>1);
}
}
close $fd
diff --git a/t/t9500-g