[Templates] Patch for html_para filter

andrew Sun, 27 Jan 2002 13:57:15 +0000

I have a patch for consideration, which provides more sophisticated 
behaviour for the html_para filter, as described below.


The filter may be used as before with no arguments and you get (just
about) the same behaviour.  The "just about" is that I have changed
the regular expressions to match blank lines rather than a series of
newlines, as I have tripped over cases where blocks of text were not
split into paragraphs because there was whitespace on the line
separating them.

The new syntax for html_para is:

    html_para(examine_tags, no_initial_para, no_end_paras)

The three columns below illustrate how the new behaviour differs from
the old (with all options set):

    Input text         Old output          New output
    ==========         ==========          ==========

    first line         <p>                 first line
                       first line
    <h2>title</h2>     </p>                <h2>title</h2>

    para               <p>                 <p>para
                       <h2>title</h2>
    another para       </p>                <p>another para

                       <p>
                       para
                       </p>

                       <p>
                       another para
                       </p>


The filtering is not perfect but it improves on some of the
infelicities of the existing behaviour (for example enclosing an h2
element in a paragraph).

The 'examine_tags' argument tells the filter to examine the first line
after the blank lines to see if it starts with a "stop tag", and if so
not to insert the </p><p> sequence.  The stop tags are basically
header and block tags plus a few other odds and sods.  This is the
current list:

     html frameset frame
     head title base link meta isindex script style object bgsound
     body h1 h2 h3 h4 h5 h6 p div  br hr
     ul ol li dl dd dt dir menu
     thead tbody tfoot tr td
     option optgroup


The 'no_initial_para' tells the filter to omit the initial <p> and
final </p> (if any).  I found that when filtering text that was going
into a <td> element, I was getting an extra <p> at the start which
changed the display.

The 'no_end_paras' suppresses the </p> end tags.  Required I know for
XHTML but then the default is no suppression.



Obviously checking (with negative zero-width lookahead) for tags will
be slower than the existing code, but probably not significantly.
The regular expression used is:

    my $stop_tags = join('|', qw(html frameset frame
                                 head title base link meta isindex script style object 
bgsound
                                 body h1 h2 h3 h4 h5 h6 p div  br hr
                                 ul ol li dl dd dt dir menu
                                 thead tbody tfoot tr td
                                 option optgroup));
    my $para_re = qr{ (?: \s*\r?\n ){2,}                         # one or more blank 
lines
                      (?! \s* < (?: /? $stop_tags ) [\s\n>] )   # not followed by a 
"stop tag"
                    }ox;

(I wanted to check the end of the preceding line too, but I got an
error about variable length look-behind not supported)

and this is plugged into:

    $text =~ s/$para_re/$html_para/gs;
    return $initial_para . $text . $final_end_para;


The original code just did:

    return "<p>\n" 
           . join("\n</p>\n\n<p>\n", split(/(?:\r?\n){2,}/, $text))  
           . "</p>\n";


The new code could be changed to a split/join if that is faster than
global substitution.  

Andrew Ford
--
Andrew Ford,  Director       Ford & Mason Ltd / Pauntley Press
[EMAIL PROTECTED]      South Wing, Compton House 
http://ford-mason.co.uk      Compton Green, Redmarley   Tel: +44 1531 829900
http://pauntley-press.co.uk  Gloucester, GL19 3JB       Fax: +44 1531 829901
http://refcards.com          Great Britain           Mobile: +44 7785 258278




[andrew@ariadne build]$ diff -u Template-Toolkit-2.06/lib/Template/Filters.pm.orig  
Template-Toolkit-2.06/lib/Template/Filters.pm
--- Template-Toolkit-2.06/lib/Template/Filters.pm.orig  Wed Nov  7 14:47:52 2001
+++ Template-Toolkit-2.06/lib/Template/Filters.pm       Sun Jan 27 12:56:21 2002
@@ -47,7 +47,6 @@
 $FILTERS = {
     # static filters 
     'uri'        => \&uri_filter,
-    'html_para'  => \&html_paragraph,
     'html_break' => \&html_break,
     'upper'      => sub { uc $_[0] },
     'lower'      => sub { lc $_[0] },
@@ -59,6 +58,7 @@
 
     # dynamic filters
     'html'       => [ \&html_filter_factory,     1 ],
+    'html_para'  => [ \&html_para_filter_factory,1 ],
     'indent'     => [ \&indent_filter_factory,   1 ],
     'format'     => [ \&format_filter_factory,   1 ],
     'truncate'   => [ \&truncate_filter_factory, 1 ],
@@ -221,21 +221,6 @@
 
 
 #------------------------------------------------------------------------
-# html_paragraph()                                 [% FILTER html_para %]
-#
-# Wrap each paragraph of text (delimited by two or more newlines) in the
-# <p>...</p> HTML tags.
-#------------------------------------------------------------------------
-
-sub html_paragraph  {
-    my $text = shift;
-    return "<p>\n" 
-           . join("\n</p>\n\n<p>\n", split(/(?:\r?\n){2,}/, $text))
-          . "</p>\n";
-}
-
-
-#------------------------------------------------------------------------
 # html_break()                                    [% FILTER html_break %]
 #
 # Join each paragraph of text (delimited by two or more newlines) with
@@ -284,6 +269,53 @@
 }
 
 
+#------------------------------------------------------------------------
+# html_para_filter_factory()   [% FILTER 
+html_para(examine_tags,no_initial_para,no_end_paras) %]
+#
+# Wrap each paragraph of text (delimited by two or more newlines) in the
+# <p>...</p> HTML tags.  If 'examine_tags' is set check for stop tags;
+# if 'no_initial_para' is set, omit initial <p> tag (and possibly final
+# </p> tag); if 'no_end_paras' is set, omit </p> end tags.
+
+#------------------------------------------------------------------------
+
+BEGIN {
+    my $stop_tags = join('|', qw(html frameset frame
+                                head title base link meta isindex script style object 
+bgsound
+                                body h1 h2 h3 h4 h5 h6 p div  br hr
+                                ul ol li dl dd dt dir menu
+                                thead tbody tfoot tr td
+                                option optgroup));
+
+    my $para_re = qr{ (?: \s*\r?\n ){2,}                       # one or more blank 
+lines
+                     (?! \s* < (?: /? $stop_tags ) [\s\n>] )   # not followed by a 
+"stop tag"
+                   }ox;
+
+    sub html_para_filter_factory {
+       my $context = shift;
+       my($examine_tags, $no_initial_para, $no_end_paras) = @_;
+       my $end_para = $no_end_paras    ? "" : "</p>";
+       my $html_para = $end_para . "\n<p>";
+       my($initial_para, $final_end_para) = $no_initial_para ? ("","") : ("<p>", 
+"$end_para\n");
+       if ($examine_tags) {
+           return sub {
+               my $text = shift;
+               $text =~ s/$para_re/$html_para/gs;
+               return $initial_para . $text . $final_end_para;
+           };
+       }
+       else {
+           return sub {
+               my $text = shift;
+               return $initial_para
+                      . join($html_para, split(/\s*\r?\n\s*\r?\n[\s\r\n]*/, $text))
+                      . $final_end_para;
+           }
+       }
+    }
+}
+
+
 #------------------------------------------------------------------------
 # indent_filter_factory($pad)                    [% FILTER indent(pad) %]
 #
[andrew@ariadne build]$ diff -u 
Template-Toolkit-2.06/docs/src/Manual/Filters.html.orig 
Template-Toolkit-2.06/docs/src/Manual/Filters.html
--- Template-Toolkit-2.06/docs/src/Manual/Filters.html.orig     Wed Nov  7 14:47:24 
2001
+++ Template-Toolkit-2.06/docs/src/Manual/Filters.html  Sun Jan 27 12:11:18 2002
@@ -135,12 +135,27 @@
 <pre>    Binary &quot;&amp;lt;=&amp;gt;&quot; returns -1, 0, or 1 depending 
on...</pre>
 [%- END %]
 [% WRAPPER subsection
-   title = 'html_para'
+   title = 'html_para(examine_tags, no_initial_para, no_end_paras)'
 -%]<p>
 This filter formats a block of text into HTML paragraphs.  A sequence of 
-two or more newlines is used as the delimiter for paragraphs which are 
+two or more blank lines is used as the delimiter for paragraphs which are 
 then wrapped in HTML &lt;p&gt;...&lt;/p&gt; tags.
 </p>
+<p>
+If the 'examine_tags' argument is set then a row of blank lines
+followed by a line that starts with a &lt;p&gt; tag or with a tag such
+as &lt;tr&gt; (that should not start a new paragraph) is not be
+regarded as a paragraph break and is therefore not replaced with the
+&lt;/p&gt;&lt;p&gt; sequence.
+</p>
+<p>
+If the 'no_initial_para' argument is set then the block of text is
+not wrapped in &lt;p&gt; and &lt;/p&gt;, which can make a difference
+to the appearance for example if the text is being inserted into a
+table cell.  If the 'no_end_paras' argument is set then paragraph end
+tags are omitted (XHTML requires end tags and the default is to
+include them).
+</p>
 <pre>    [% tt_start_tag %] FILTER html_para [% tt_end_tag %]
     The cat sat on the mat.</pre>
 <pre>    Mary had a little lamb.
[andrew@ariadne build]$ diff -u 
Template-Toolkit-2.06/docs/src/Modules/Template/Filters.html.orig 
Template-Toolkit-2.06/docs/src/Modules/Template/Filters.html
--- Template-Toolkit-2.06/docs/src/Modules/Template/Filters.html.orig   Wed Nov  7 
14:47:30 2001
+++ Template-Toolkit-2.06/docs/src/Modules/Template/Filters.html        Sun Jan 27 
+13:07:01 2002
@@ -323,12 +323,27 @@
 <pre>    Binary &quot;&amp;lt;=&amp;gt;&quot; returns -1, 0, or 1 depending 
on...</pre>
 [%- END %]
 [% WRAPPER subsection
-   title = 'html_para'
+   title = 'html_para(examine_tags, no_initial_para, no_end_paras)'
 -%]<p>
 This filter formats a block of text into HTML paragraphs.  A sequence of 
 two or more newlines is used as the delimiter for paragraphs which are 
 then wrapped in HTML &lt;p&gt;...&lt;/p&gt; tags.
 </p>
+<p>
+If the 'examine_tags' argument is set then a row of blank lines
+followed by a line that starts with a &lt;p&gt; tag or with a tag such
+as &lt;tr&gt; (that should not start a new paragraph) is not be
+regarded as a paragraph break and is therefore not replaced with the
+&lt;/p&gt;&lt;p&gt; sequence.
+</p>
+<p>
+If the 'no_initial_para' argument is set then the block of text is
+not wrapped in &lt;p&gt; and &lt;/p&gt;, which can make a difference
+to the appearance for example if the text is being inserted into a
+table cell.  If the 'no_end_paras' argument is set then paragraph end
+tags are omitted (XHTML requires end tags and the default is to
+include them).
+</p>
 <pre>    [% tt_start_tag %] FILTER html_para [% tt_end_tag %]
     The cat sat on the mat.</pre>
 <pre>    Mary had a little lamb.

[Templates] Patch for html_para filter

Reply via email to