[aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-03 Thread Lukas Fleischer
Comments are now split at link boundaries and links are converted
separately. I find this to be a much cleaner way than re-converting
comments that have already been converted using htmlspecialchars(). This
also doesn't require any callback procedure.

---
 web/lib/aur.inc   |   24 
 web/template/pkg_comments.php |2 +-
 2 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/web/lib/aur.inc b/web/lib/aur.inc
index bd69c4c..a6292ca 100644
--- a/web/lib/aur.inc
+++ b/web/lib/aur.inc
@@ -494,3 +494,27 @@ function salted_hash($passwd, $salt)
}
return md5($salt . $passwd);
 }
+
+function parse_comment($comment)
+{
+   $url_pattern = '/(\b(?:https?|ftp):\/\/[\w\/\#~:.?+=%...@!\-;,]+?' .
+   '(?=[.:?\-;,]*(?:[^\w\/\#~:.?+=%...@!\-;,]|$)))/iS';
+
+   $matches = preg_split($url_pattern, $comment, -1,
+   PREG_SPLIT_DELIM_CAPTURE);
+
+   $html = '';
+   for ($i = 0; $i  count($matches); $i++) {
+   if ($i % 2) {
+   # convert links
+   $html .= 'a href=' . htmlspecialchars($matches[$i]) .
+   '' .  htmlspecialchars($matches[$i]) . '/a';
+   }
+   else {
+   # convert everything else
+   $html .= nl2br(htmlspecialchars($matches[$i]));
+   }
+   }
+
+   return $html;
+}
diff --git a/web/template/pkg_comments.php b/web/template/pkg_comments.php
index 02171a0..2ca9bf0 100644
--- a/web/template/pkg_comments.php
+++ b/web/template/pkg_comments.php
@@ -20,7 +20,7 @@ while (list($indx, $carr) = each($comments)) { ?
 ?/div
blockquote class=comment-body
div
-?php echo nl2br(htmlspecialchars($carr['Comments'])) ?
+?php echo parse_comment($carr['Comments']) ?
/div
/blockquote
 ?php
-- 
1.7.3.1



Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-02 Thread Loui Chang
On Fri 01 Oct 2010 01:05 +0200, Lukas Fleischer wrote:
 On Thu, Sep 30, 2010 at 08:56:56PM +0200, PyroPeter wrote:
  You can also link to a homepage using valid URL's. The additional
  feature may be nice, but makes the code more complex. It also
  trains users to omit the http://; and produces more work for devs,
  as they all now have to parse this invalid hostname+path stuff.
 
 Hm, that's a question of taste. We'll let Loui decide :p

I generally agree with that.



Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-01 Thread PyroPeter

On 10/01/2010 02:15 PM, PyroPeter wrote:

I did not mean to offend you, and after applying the patch (which I
should have done before sending the mails, you are right) your code
in fact seems to work a lot better then I thought.

While testing, I found a bug:
Post this URL: http://foo.bar/lt;gt;lt;gt;
It seems to trigger two bugs at once, first, the regex does not match
whole URL, and second, the href is escaped twice.

What I ment was the URL http://foo.bar/;
Also, s/whole/the whole/


Regards, PyroPeter

--
freenode/pyropeter  12:50 - Ich drücke Return.


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-01 Thread Lukas Fleischer
On Fri, Oct 01, 2010 at 02:15:41PM +0200, PyroPeter wrote:
 I did not mean to offend you, and after applying the patch (which I
 should have done before sending the mails, you are right) your code
 in fact seems to work a lot better then I thought.

I didn't feel offended in any way, but reporting bugs that don't exist
is just counterproductive and a waste of time.

 While testing, I found a bug:
 Post this URL: http://foo.bar/lt;gt;lt;gt;
 It seems to trigger two bugs at once, first, the regex does not match
 whole URL, and second, the href is escaped twice.

I already said that there might be problems if the URL contains quotes
or less-than/greater-than symbols in another mail [1]. This can be fixed
by removing the first str_replace() (which I also proposed in the same
mail) or by repeating the second str_replace() for quot;, #039;,
lt; and gt; (which might be even better from the perspective of
security). However, I don't think such URLs will be a common use case.

The second bug is expected behaviour, since punctuation marks at the
end of URLs shouldn't be included in the URL itself (imagine someone
putting a link at the end of a senctence). This is also how DokuWiki and
Flyspray behave. If there really is an URL requiring a punctuation mark
at the end of the URL (which there shouldn't be at all), this can be
remarked in the comment itself.

[1]
http://mailman.archlinux.org/pipermail/aur-dev/2010-September/001263.html


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-01 Thread PyroPeter

On 09/30/2010 05:22 PM, Lukas Fleischer wrote:

This is a bit hacky patch to make links in AUR comments clickable (fixes
FS#20137 [1]). Huge parts of this code are ripped from the DokuWiki
plugin that is also used in Flyspray.

I didn't have any time to test it extensively so I'd suggest to do some
more tests if this will be commited.


I fixed the / bugs and changed the indention to tabs.
I also changed the regex to one that accepts everything
that starts with one of 'http', 'https' or 'ftp' followed by
a colon :, contains no whitespace and ends with a letter (\w)
or a slash /.
It also parses hostnames starting with www. and ending
in a 2 to 5 letters long TLD consisting of only a-z.

I have tested it with the comments at [0].

Regards, PyroPeter

[0] http://aur.keks.selfip.org/packages.php?ID=298
---
diff --git a/web/lib/aur.inc b/web/lib/aur.inc
index bd69c4c..7465fac 100644
--- a/web/lib/aur.inc
+++ b/web/lib/aur.inc
@@ -494,3 +494,24 @@ function salted_hash($passwd, $salt)
}
return md5($salt . $passwd);
 }
+
+function parse_link($matches)
+{
+   $name = $matches[0];
+   $url = preg_replace('/^www\./', 'http://www.', $matches[0]);
+   return 'a href=' . $url . '' . $name . '/a';
+}
+
+function parse_comment($comment)
+{
+   $regex = '/' . str_replace('/', '\/', implode('|',array(
+   '\b(http|https|ftp):[^\s]*[\w\d/]',
+   '\bwww\.[^\s/]+\.[a-z]{2,5}'
+   ))) . '/S';
+
+   $comment = htmlspecialchars($comment);
+   $comment = preg_replace_callback($regex, 'parse_link', $comment);
+   $comment = nl2br($comment);
+
+   return $comment;
+}
diff --git a/web/template/pkg_comments.php b/web/template/pkg_comments.php
index 02171a0..2ca9bf0 100644
--- a/web/template/pkg_comments.php
+++ b/web/template/pkg_comments.php
@@ -20,7 +20,7 @@ while (list($indx, $carr) = each($comments)) { ?
 ?/div
blockquote class=comment-body
div
-?php echo nl2br(htmlspecialchars($carr['Comments'])) ?
+?php echo parse_comment($carr['Comments']) ?
/div
/blockquote
 ?php

--
freenode/pyropeter  12:50 - Ich drücke Return.


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-01 Thread Lukas Fleischer
On Fri, Oct 01, 2010 at 04:59:22PM +0200, PyroPeter wrote:
 I fixed the / bugs and changed the indention to tabs.
 I also changed the regex to one that accepts everything
 that starts with one of 'http', 'https' or 'ftp' followed by
 a colon :, contains no whitespace and ends with a letter (\w)
 or a slash /.
 It also parses hostnames starting with www. and ending
 in a 2 to 5 letters long TLD consisting of only a-z.
 
 [...]

This won't match URLs like
https://aur.archlinux.org/packages.php?O=0K=; and an ampersand at the
end of an URL won't be converted correctly :/ I'll try to implement it a
more proper way the next days. Maybe I'll actually go with splitting
comments at link boundaries as you suggested before... :)


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-01 Thread PyroPeter

On 10/01/2010 05:52 PM, Lukas Fleischer wrote:

This won't match URLs like
https://aur.archlinux.org/packages.php?O=0K=; and an ampersand at the
end of an URL won't be converted correctly :/ I'll try to implement it a
more proper way the next days. Maybe I'll actually go with splitting
comments at link boundaries as you suggested before... :)


Well, that's the problem. Which characters should belong to the end of
the URL, and which should not? There could also be cases in which
punctuation belongs to the URL. If punctuation is parsed as not
belonging to the URL, there would be no way to post a working link
to certain URLs. If punctuation is parsed as part of the URL, one could
insert a space between the URL and the punctuation that should not
belong to the URL. One should also consider that inserting an URL into
a sentence looks horrible and is normally not done (by me, at least).

About splitting at boundaries: Contrary to what I have said before,
using regular expressions seems to be a valid and efficient way.
(I thought you would have to escape tag-content and attributes in
different ways (percent-encoding vs. html-entities). After reading
the HTML4 specification I realized this is not the case, as content and
attributes are both escaped using html-entities)

Regards, PyroPeter
--
freenode/pyropeter  12:50 - Ich drücke Return.


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-01 Thread Lukas Fleischer
On Fri, Oct 01, 2010 at 06:23:06PM +0200, PyroPeter wrote:
 About splitting at boundaries: Contrary to what I have said before,
 using regular expressions seems to be a valid and efficient way.
 (I thought you would have to escape tag-content and attributes in
 different ways (percent-encoding vs. html-entities). After reading
 the HTML4 specification I realized this is not the case, as content and
 attributes are both escaped using html-entities)

Using regular expressions is an efficient way, but they should be
applied before htmlspecialchars() or anything similar is applied. E.g.
we could use preg_match() or preg_match_all() with PREG_OFFSET_CAPTURE
to get the positions of all links, then call a function, that converts
links and converts the stings as necessary, and convert the parts that
don't contain any links separately using htmlspecialchars().


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-10-01 Thread Jan-Erik Rediger
On Fri, Oct 01, 2010 at 06:23:06PM +0200, PyroPeter wrote:
 On 10/01/2010 05:52 PM, Lukas Fleischer wrote:
 This won't match URLs like
 https://aur.archlinux.org/packages.php?O=0K=; and an ampersand at the
 end of an URL won't be converted correctly :/ I'll try to implement it a
 more proper way the next days. Maybe I'll actually go with splitting
 comments at link boundaries as you suggested before... :)
 
 Well, that's the problem. Which characters should belong to the end of
 the URL, and which should not? There could also be cases in which
 punctuation belongs to the URL. If punctuation is parsed as not
 belonging to the URL, there would be no way to post a working link
 to certain URLs. If punctuation is parsed as part of the URL, one could
 insert a space between the URL and the punctuation that should not
 belong to the URL. One should also consider that inserting an URL into
 a sentence looks horrible and is normally not done (by me, at least).
 
 About splitting at boundaries: Contrary to what I have said before,
 using regular expressions seems to be a valid and efficient way.
 (I thought you would have to escape tag-content and attributes in
 different ways (percent-encoding vs. html-entities). After reading
 the HTML4 specification I realized this is not the case, as content and
 attributes are both escaped using html-entities)
 
 Regards, PyroPeter
 -- 
 freenode/pyropeter  12:50 - Ich drücke Return.

I didn't read the whole thread but as far as I understand you're
searching for a proper solution how to correctly find urls in comments.

John Gruber's Regex seems quite right for this:
http://daringfireball.net/2010/07/improved_regex_for_matching_urls

Does this help?

Jan-Erik (badboy_)


[aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-09-30 Thread Lukas Fleischer
This is a bit hacky patch to make links in AUR comments clickable (fixes
FS#20137 [1]). Huge parts of this code are ripped from the DokuWiki
plugin that is also used in Flyspray.

I didn't have any time to test it extensively so I'd suggest to do some
more tests if this will be commited.

[1] https://bugs.archlinux.org/task/20137

---
 web/lib/aur.inc   |   45 +
 web/template/pkg_comments.php |2 +-
 2 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/web/lib/aur.inc b/web/lib/aur.inc
index bd69c4c..b0cfdc8 100644
--- a/web/lib/aur.inc
+++ b/web/lib/aur.inc
@@ -494,3 +494,48 @@ function salted_hash($passwd, $salt)
}
return md5($salt . $passwd);
 }
+
+function parse_link($matches)
+{
+  $name = $url = $matches[0];
+
+  if(substr($url, 0, 3) == 'ftp'  (substr($url, 0, 6) != 'ftp://')) {
+$url = 'ftp://'.$url;
+  }
+  elseif (substr($url, 0, 3) == 'www') {
+$url = 'http://'.$url;
+  }
+
+  $url = str_replace('', 'amp;', $url);
+  $url = str_replace('amp;amp;', 'amp;', $url);
+  $url = strtr($url, array('' = '%3E', '' = '%3C', '' = '%22'));
+
+  return 'a href=' . $url . '' . $name . '/a';
+}
+
+function parse_comment($comment)
+{
+  $schemes = array('http', 'https', 'ftp');
+  $ltrs = '\w';
+  $gunk = '\/\#~:.?+=%...@!\-';
+  $punc = '.:?\-;,';
+  $host = $ltrs . $punc;
+  $any = $ltrs . $gunk . $punc;
+
+  $patterns = array();
+
+  foreach ($schemes as $scheme) {
+$patterns[] = '(\b(?i)' . $scheme . '(?-i):\/\/[' . $any . ']+?(?=[' . 
$punc . ']*[^' . $any . ']))';
+  }
+
+  $patterns[] = '(\b(?i)www?(?-i)\.[' . $host . ']+?\.[' . $host . ']+?[' . 
$any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
+  $patterns[] = '(\b(?i)ftp?(?-i)\.['. $host . ']+?\.[' . $host . ']+?[' . 
$any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
+
+  $regex = '/' . implode('|', $patterns) . '/msS';
+
+  $comment = htmlspecialchars($comment);
+  $comment = preg_replace_callback($regex, parse_link, $comment . \n);
+  $comment = nl2br($comment);
+
+  return $comment;
+}
diff --git a/web/template/pkg_comments.php b/web/template/pkg_comments.php
index 02171a0..2ca9bf0 100644
--- a/web/template/pkg_comments.php
+++ b/web/template/pkg_comments.php
@@ -20,7 +20,7 @@ while (list($indx, $carr) = each($comments)) { ?
 ?/div
blockquote class=comment-body
div
-?php echo nl2br(htmlspecialchars($carr['Comments'])) ?
+?php echo parse_comment($carr['Comments']) ?
/div
/blockquote
 ?php
-- 
1.7.3



Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-09-30 Thread PyroPeter

On 09/30/2010 05:22 PM, Lukas Fleischer wrote:

This is a bit hacky patch to make links in AUR comments clickable (fixes
FS#20137 [1]). Huge parts of this code are ripped from the DokuWiki
plugin that is also used in Flyspray.

I didn't have any time to test it extensively so I'd suggest to do some
more tests if this will be commited.

[1] https://bugs.archlinux.org/task/20137

---
  web/lib/aur.inc   |   45 +
  web/template/pkg_comments.php |2 +-
  2 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/web/lib/aur.inc b/web/lib/aur.inc
index bd69c4c..b0cfdc8 100644
--- a/web/lib/aur.inc
+++ b/web/lib/aur.inc
@@ -494,3 +494,48 @@ function salted_hash($passwd, $salt)
}
return md5($salt . $passwd);
  }
+
+function parse_link($matches)
+{
+  $name = $url = $matches[0];
+
+  if(substr($url, 0, 3) == 'ftp'  (substr($url, 0, 6) != 'ftp://')) {
+$url = 'ftp://'.$url;
+  }
+  elseif (substr($url, 0, 3) == 'www') {
+$url = 'http://'.$url;
+  }
+
+  $url = str_replace('','amp;', $url);
+  $url = str_replace('amp;amp;', 'amp;', $url);


What about the occurrences of amp;(html-entity-code-here); you
produced the line before?


+  $url = strtr($url, array('' =  '%3E','' =  '%3C', '' =  '%22'));
+
+  return 'a href=' . $url . '' . $name .'/a';
+}
+
+function parse_comment($comment)
+{
+  $schemes = array('http', 'https', 'ftp');
+  $ltrs = '\w';
+  $gunk = '\/\#~:.?+=%...@!\-';
+  $punc = '.:?\-;,';
+  $host = $ltrs . $punc;
+  $any = $ltrs . $gunk . $punc;
+
+  $patterns = array();
+
+  foreach ($schemes as $scheme) {
+$patterns[] = '(\b(?i)' . $scheme . '(?-i):\/\/[' . $any . ']+?(?=[' . 
$punc . ']*[^' . $any . ']))';
+  }
+
+  $patterns[] = '(\b(?i)www?(?-i)\.[' . $host . ']+?\.[' . $host . ']+?[' . 
$any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
+  $patterns[] = '(\b(?i)ftp?(?-i)\.['. $host . ']+?\.[' . $host . ']+?[' . 
$any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';


I am not that experienced with PHP, but this looks like the $patterns
array got replaced instead of extended.


+
+  $regex = '/' . implode('|', $patterns) . '/msS';
+
+  $comment = htmlspecialchars($comment);


Won't this render the next instruction useless if there are
html-characters in a link?


+  $comment = preg_replace_callback($regex, parse_link, $comment . \n);
+  $comment = nl2br($comment);
+
+  return $comment;
+}
diff --git a/web/template/pkg_comments.php b/web/template/pkg_comments.php
index 02171a0..2ca9bf0 100644
--- a/web/template/pkg_comments.php
+++ b/web/template/pkg_comments.php
@@ -20,7 +20,7 @@ while (list($indx, $carr) = each($comments)) { ?
  ?/div
blockquote class=comment-body
div
-?php echo nl2br(htmlspecialchars($carr['Comments'])) ?
+?php echo parse_comment($carr['Comments']) ?
/div
/blockquote
  ?php


Generally I would not make hostnames (www.foo.tld) clickable.
If people are not able to provide proper URL's, they have a serious
problem. (there is also the technical argument that the hostname is not
a good indicator for the kind of service the host provides.)

Regards, PyroPeter
--
freenode/pyropeter  12:50 - Ich drücke Return.


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-09-30 Thread Lukas Fleischer
On Thu, Sep 30, 2010 at 06:18:24PM +0200, PyroPeter wrote:
 +  $url = str_replace('','amp;', $url);
 +  $url = str_replace('amp;amp;', 'amp;', $url);
 
 What about the occurrences of amp;(html-entity-code-here); you
 produced the line before?

Nothing? Any occurrence of an HTML entity code is correctly encoded as
amp;. People shouldn't be able to manually insert HTML entities in
comments. The first line is actually even superfluous as I realized just
now since ampersands should already have been replaced by
htmlspecialchars() before at the time this line is executed (didn't
check that before, this part of code has been extracted from the
DokuWiki plugin).

 +  $patterns[] = '(\b(?i)www?(?-i)\.[' . $host . ']+?\.[' . $host . ']+?[' . 
 $any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
 +  $patterns[] = '(\b(?i)ftp?(?-i)\.['. $host . ']+?\.[' . $host . ']+?[' . 
 $any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
 
 I am not that experienced with PHP, but this looks like the $patterns
 array got replaced instead of extended.

Nope, it doesn't. Check [1].

 +  $comment = htmlspecialchars($comment);
 
 Won't this render the next instruction useless if there are
 html-characters in a link?

Nope. Links need to be escaped as well. Not sure what happens if a link
contains quotes or /. This shouldn't happen too often tho.

 Generally I would not make hostnames (www.foo.tld) clickable.
 If people are not able to provide proper URL's, they have a serious
 problem. (there is also the technical argument that the hostname is not
 a good indicator for the kind of service the host provides.)

Why not? What if you explicitly want to link to a project's home page?
It'll also just convert hostnames if they start with a www or ftp
subdomain, so comments refering to domains in other ways won't be
converted.

[1]
http://www.php.net/manual/de/language.types.array.php#language.types.array.syntax.modifying


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-09-30 Thread PyroPeter

On 09/30/2010 06:38 PM, Lukas Fleischer wrote:

On Thu, Sep 30, 2010 at 06:18:24PM +0200, PyroPeter wrote:

+  $url = str_replace('','amp;', $url);
+  $url = str_replace('amp;amp;', 'amp;', $url);


What about the occurrences of amp;(html-entity-code-here); you
produced the line before?


Nothing? Any occurrence of an HTML entity code is correctly encoded as
amp;. People shouldn't be able to manually insert HTML entities in
comments. The first line is actually even superfluous as I realized just
now since ampersands should already have been replaced by
htmlspecialchars() before at the time this line is executed (didn't
check that before, this part of code has been extracted from the
DokuWiki plugin).


Well, but you are encoding existing entities, that are not amp; as
amp;foo;. See the example below.


+  $patterns[] = '(\b(?i)www?(?-i)\.[' . $host . ']+?\.[' . $host . ']+?[' . 
$any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';
+  $patterns[] = '(\b(?i)ftp?(?-i)\.['. $host . ']+?\.[' . $host . ']+?[' . 
$any . ']+?(?=[' . $punc . ']*[^' . $any . ']))';


I am not that experienced with PHP, but this looks like the $patterns
array got replaced instead of extended.


Nope, it doesn't. Check [1].


I see, $var[] = foo creates the array $var if necessary and appends
foo.


+  $comment = htmlspecialchars($comment);


Won't this render the next instruction useless if there are
html-characters in a link?


Nope. Links need to be escaped as well. Not sure what happens if a link
contains quotes or /. This shouldn't happen too often tho.


Imo, you should split the message at the link boundaries.
( foo , http://foo.bar.tld;,  baz)
Then you should encode the html-entities in all elements, wrap the links
in a's, and then join all that together.

I can not think of a way to connect a-tags with proper encoding of
the user input using just normal string functions like replace, or
regexes.

I would be happy if you could prove the opposite, and will help by
providing you with input that breaks your system.

== example 1 ==

input: foo http://foo.tld/iLikeToUseApersands/foobar.html baz

If I am not mistaken, $regex would be
/http://foo.tld/iLikeToUseApersands/foobar.html/msS;
(are the / correctly escaped? I will assume they are.)

Then, $regex would be:
/http:\/\/foo\.tld\/iLikeToUseApersands\/foobar\.html/msS

$comment would be set by htmlspecialchars() to:
foo http://foo.tld/iLikeToUseApersands/fooamp;bar.html baz

= preg_replace_callback() would not match, as  got replaced.


Generally I would not make hostnames (www.foo.tld) clickable.
If people are not able to provide proper URL's, they have a serious
problem. (there is also the technical argument that the hostname is not
a good indicator for the kind of service the host provides.)


Why not? What if you explicitly want to link to a project's home page?
It'll also just convert hostnames if they start with a www or ftp
subdomain, so comments refering to domains in other ways won't be
converted.


You can also link to a homepage using valid URL's. The additional
feature may be nice, but makes the code more complex. It also
trains users to omit the http://; and produces more work for devs,
as they all now have to parse this invalid hostname+path stuff.

Unrelated: You seem to accept only a-zA-Z in hostnames? Or does
PHP's \w include 0-9 and language-dependent letters? What about
underscores?

Why does the a's content only include the Path of the URL?

Regards, PyroPeter
--
freenode/pyropeter  12:50 - Ich drücke Return.


Re: [aur-dev] [PATCH] Make external links in comments clickable (FS#20137).

2010-09-30 Thread Lukas Fleischer
On Thu, Sep 30, 2010 at 08:56:56PM +0200, PyroPeter wrote:
 Well, but you are encoding existing entities, that are not amp; as
 amp;foo;. See the example below.

Yep, and that's how it's supposed to be. There shouldn't be any entities
that users put in the comments and that are not encoded.

 I see, $var[] = foo creates the array $var if necessary and appends
 foo.

Correct.

 Imo, you should split the message at the link boundaries.
 ( foo , http://foo.bar.tld;,  baz)
 Then you should encode the html-entities in all elements, wrap the links
 in a's, and then join all that together.

Yes... That would be cleaner, but also way more complicated to implement
and would require huge amounts of code for making links clickable.

 == example 1 ==
 
 input: foo http://foo.tld/iLikeToUseApersands/foobar.html baz
 
 If I am not mistaken, $regex would be
 /http://foo.tld/iLikeToUseApersands/foobar.html/msS;
 (are the / correctly escaped? I will assume they are.)
 
 Then, $regex would be:
 /http:\/\/foo\.tld\/iLikeToUseApersands\/foobar\.html/msS
 
 $comment would be set by htmlspecialchars() to:
 foo http://foo.tld/iLikeToUseApersands/fooamp;bar.html baz
 
 = preg_replace_callback() would not match, as  got replaced.

Why should it not work? preg_replace_callback() still matches if the URL
contains a semicolon. This will be parsed and output a valid link
(tested with current GIT version and patch applied).

 You can also link to a homepage using valid URL's. The additional
 feature may be nice, but makes the code more complex. It also
 trains users to omit the http://; and produces more work for devs,
 as they all now have to parse this invalid hostname+path stuff.

Hm, that's a question of taste. We'll let Loui decide :p

 Unrelated: You seem to accept only a-zA-Z in hostnames? Or does
 PHP's \w include 0-9 and language-dependent letters? What about
 underscores?

\w in perl compatible regex includes all alphanumeric characters plus
the underscore (_).

 Why does the a's content only include the Path of the URL?

It doesn't. The a/a's content contains excactly what the user
typed (with special chars converted by htmlspecialchars()).

Please don't just assume things but test your examples using a current
GIT checkout with the patch applied in future.