I have a patch for consideration, which provides more sophisticated
behaviour for the html_para filter, as described below.
The filter may be used as before with no arguments and you get (just
about) the same behaviour. The "just about" is that I have changed
the regular expressions to match blank lines rather than a series of
newlines, as I have tripped over cases where blocks of text were not
split into paragraphs because there was whitespace on the line
separating them.
The new syntax for html_para is:
html_para(examine_tags, no_initial_para, no_end_paras)
The three columns below illustrate how the new behaviour differs from
the old (with all options set):
Input text Old output New output
========== ========== ==========
first line <p> first line
first line
<h2>title</h2> </p> <h2>title</h2>
para <p> <p>para
<h2>title</h2>
another para </p> <p>another para
<p>
para
</p>
<p>
another para
</p>
The filtering is not perfect but it improves on some of the
infelicities of the existing behaviour (for example enclosing an h2
element in a paragraph).
The 'examine_tags' argument tells the filter to examine the first line
after the blank lines to see if it starts with a "stop tag", and if so
not to insert the </p><p> sequence. The stop tags are basically
header and block tags plus a few other odds and sods. This is the
current list:
html frameset frame
head title base link meta isindex script style object bgsound
body h1 h2 h3 h4 h5 h6 p div br hr
ul ol li dl dd dt dir menu
thead tbody tfoot tr td
option optgroup
The 'no_initial_para' tells the filter to omit the initial <p> and
final </p> (if any). I found that when filtering text that was going
into a <td> element, I was getting an extra <p> at the start which
changed the display.
The 'no_end_paras' suppresses the </p> end tags. Required I know for
XHTML but then the default is no suppression.
Obviously checking (with negative zero-width lookahead) for tags will
be slower than the existing code, but probably not significantly.
The regular expression used is:
my $stop_tags = join('|', qw(html frameset frame
head title base link meta isindex script style object
bgsound
body h1 h2 h3 h4 h5 h6 p div br hr
ul ol li dl dd dt dir menu
thead tbody tfoot tr td
option optgroup));
my $para_re = qr{ (?: \s*\r?\n ){2,} # one or more blank
lines
(?! \s* < (?: /? $stop_tags ) [\s\n>] ) # not followed by a
"stop tag"
}ox;
(I wanted to check the end of the preceding line too, but I got an
error about variable length look-behind not supported)
and this is plugged into:
$text =~ s/$para_re/$html_para/gs;
return $initial_para . $text . $final_end_para;
The original code just did:
return "<p>\n"
. join("\n</p>\n\n<p>\n", split(/(?:\r?\n){2,}/, $text))
. "</p>\n";
The new code could be changed to a split/join if that is faster than
global substitution.
Andrew Ford
--
Andrew Ford, Director Ford & Mason Ltd / Pauntley Press
[EMAIL PROTECTED] South Wing, Compton House
http://ford-mason.co.uk Compton Green, Redmarley Tel: +44 1531 829900
http://pauntley-press.co.uk Gloucester, GL19 3JB Fax: +44 1531 829901
http://refcards.com Great Britain Mobile: +44 7785 258278
[andrew@ariadne build]$ diff -u Template-Toolkit-2.06/lib/Template/Filters.pm.orig
Template-Toolkit-2.06/lib/Template/Filters.pm
--- Template-Toolkit-2.06/lib/Template/Filters.pm.orig Wed Nov 7 14:47:52 2001
+++ Template-Toolkit-2.06/lib/Template/Filters.pm Sun Jan 27 12:56:21 2002
@@ -47,7 +47,6 @@
$FILTERS = {
# static filters
'uri' => \&uri_filter,
- 'html_para' => \&html_paragraph,
'html_break' => \&html_break,
'upper' => sub { uc $_[0] },
'lower' => sub { lc $_[0] },
@@ -59,6 +58,7 @@
# dynamic filters
'html' => [ \&html_filter_factory, 1 ],
+ 'html_para' => [ \&html_para_filter_factory,1 ],
'indent' => [ \&indent_filter_factory, 1 ],
'format' => [ \&format_filter_factory, 1 ],
'truncate' => [ \&truncate_filter_factory, 1 ],
@@ -221,21 +221,6 @@
#------------------------------------------------------------------------
-# html_paragraph() [% FILTER html_para %]
-#
-# Wrap each paragraph of text (delimited by two or more newlines) in the
-# <p>...</p> HTML tags.
-#------------------------------------------------------------------------
-
-sub html_paragraph {
- my $text = shift;
- return "<p>\n"
- . join("\n</p>\n\n<p>\n", split(/(?:\r?\n){2,}/, $text))
- . "</p>\n";
-}
-
-
-#------------------------------------------------------------------------
# html_break() [% FILTER html_break %]
#
# Join each paragraph of text (delimited by two or more newlines) with
@@ -284,6 +269,53 @@
}
+#------------------------------------------------------------------------
+# html_para_filter_factory() [% FILTER
+html_para(examine_tags,no_initial_para,no_end_paras) %]
+#
+# Wrap each paragraph of text (delimited by two or more newlines) in the
+# <p>...</p> HTML tags. If 'examine_tags' is set check for stop tags;
+# if 'no_initial_para' is set, omit initial <p> tag (and possibly final
+# </p> tag); if 'no_end_paras' is set, omit </p> end tags.
+
+#------------------------------------------------------------------------
+
+BEGIN {
+ my $stop_tags = join('|', qw(html frameset frame
+ head title base link meta isindex script style object
+bgsound
+ body h1 h2 h3 h4 h5 h6 p div br hr
+ ul ol li dl dd dt dir menu
+ thead tbody tfoot tr td
+ option optgroup));
+
+ my $para_re = qr{ (?: \s*\r?\n ){2,} # one or more blank
+lines
+ (?! \s* < (?: /? $stop_tags ) [\s\n>] ) # not followed by a
+"stop tag"
+ }ox;
+
+ sub html_para_filter_factory {
+ my $context = shift;
+ my($examine_tags, $no_initial_para, $no_end_paras) = @_;
+ my $end_para = $no_end_paras ? "" : "</p>";
+ my $html_para = $end_para . "\n<p>";
+ my($initial_para, $final_end_para) = $no_initial_para ? ("","") : ("<p>",
+"$end_para\n");
+ if ($examine_tags) {
+ return sub {
+ my $text = shift;
+ $text =~ s/$para_re/$html_para/gs;
+ return $initial_para . $text . $final_end_para;
+ };
+ }
+ else {
+ return sub {
+ my $text = shift;
+ return $initial_para
+ . join($html_para, split(/\s*\r?\n\s*\r?\n[\s\r\n]*/, $text))
+ . $final_end_para;
+ }
+ }
+ }
+}
+
+
#------------------------------------------------------------------------
# indent_filter_factory($pad) [% FILTER indent(pad) %]
#
[andrew@ariadne build]$ diff -u
Template-Toolkit-2.06/docs/src/Manual/Filters.html.orig
Template-Toolkit-2.06/docs/src/Manual/Filters.html
--- Template-Toolkit-2.06/docs/src/Manual/Filters.html.orig Wed Nov 7 14:47:24
2001
+++ Template-Toolkit-2.06/docs/src/Manual/Filters.html Sun Jan 27 12:11:18 2002
@@ -135,12 +135,27 @@
<pre> Binary "&lt;=&gt;" returns -1, 0, or 1 depending
on...</pre>
[%- END %]
[% WRAPPER subsection
- title = 'html_para'
+ title = 'html_para(examine_tags, no_initial_para, no_end_paras)'
-%]<p>
This filter formats a block of text into HTML paragraphs. A sequence of
-two or more newlines is used as the delimiter for paragraphs which are
+two or more blank lines is used as the delimiter for paragraphs which are
then wrapped in HTML <p>...</p> tags.
</p>
+<p>
+If the 'examine_tags' argument is set then a row of blank lines
+followed by a line that starts with a <p> tag or with a tag such
+as <tr> (that should not start a new paragraph) is not be
+regarded as a paragraph break and is therefore not replaced with the
+</p><p> sequence.
+</p>
+<p>
+If the 'no_initial_para' argument is set then the block of text is
+not wrapped in <p> and </p>, which can make a difference
+to the appearance for example if the text is being inserted into a
+table cell. If the 'no_end_paras' argument is set then paragraph end
+tags are omitted (XHTML requires end tags and the default is to
+include them).
+</p>
<pre> [% tt_start_tag %] FILTER html_para [% tt_end_tag %]
The cat sat on the mat.</pre>
<pre> Mary had a little lamb.
[andrew@ariadne build]$ diff -u
Template-Toolkit-2.06/docs/src/Modules/Template/Filters.html.orig
Template-Toolkit-2.06/docs/src/Modules/Template/Filters.html
--- Template-Toolkit-2.06/docs/src/Modules/Template/Filters.html.orig Wed Nov 7
14:47:30 2001
+++ Template-Toolkit-2.06/docs/src/Modules/Template/Filters.html Sun Jan 27
+13:07:01 2002
@@ -323,12 +323,27 @@
<pre> Binary "&lt;=&gt;" returns -1, 0, or 1 depending
on...</pre>
[%- END %]
[% WRAPPER subsection
- title = 'html_para'
+ title = 'html_para(examine_tags, no_initial_para, no_end_paras)'
-%]<p>
This filter formats a block of text into HTML paragraphs. A sequence of
two or more newlines is used as the delimiter for paragraphs which are
then wrapped in HTML <p>...</p> tags.
</p>
+<p>
+If the 'examine_tags' argument is set then a row of blank lines
+followed by a line that starts with a <p> tag or with a tag such
+as <tr> (that should not start a new paragraph) is not be
+regarded as a paragraph break and is therefore not replaced with the
+</p><p> sequence.
+</p>
+<p>
+If the 'no_initial_para' argument is set then the block of text is
+not wrapped in <p> and </p>, which can make a difference
+to the appearance for example if the text is being inserted into a
+table cell. If the 'no_end_paras' argument is set then paragraph end
+tags are omitted (XHTML requires end tags and the default is to
+include them).
+</p>
<pre> [% tt_start_tag %] FILTER html_para [% tt_end_tag %]
The cat sat on the mat.</pre>
<pre> Mary had a little lamb.