Hi all,
After several hours of tearing my hair out, I've discovered something
interesting...
*The regex which matches hashtags won't work on standard RHEL/CentOS5,
because PCRE is complied without unicode support.*
Background
--------------
I was trying to work out why my fresh 0.9.0 install on CentOS5.4 didn't
seem to detect my hash tags. I'd post notices with plenty of hash tags,
and none of them would be detected. My notice_tags table was empty.
I tracked the problem down to this code in Classes/Notice.php:
/**
* Extract #hashtags from this notice's content and save them to
the database.
*/
function saveTags()
{
/* extract all #hastags */
$count = preg_match_all('/(?:^|\s)#([\pL\pN_\-\.]{1,64})/',
strtolower($this->content), $match);
if (!$count) {
return true;
}
The problem is that the UTF-8 / unicode regex characters "\pL" and "\pN"
are not matched, but no error is thrown.
When building a simpler regex using these characters, I got an error:
PHP Warning: preg_match_all(): Compilation failed: support for \P,
\p, and \X has not been compiled
I discovered this page
(http://gaarai.com/2009/01/31/unicode-support-on-centos-52-with-php-and-pcre/)
which details how to rebuild PCRE with unicode support, and after doing
so, my hash tags are working perfectly.
At this point, it doesn't look like this will be fixed upstream until
the next major (6) release of RHEL/CentOS
(https://bugzilla.redhat.com/show_bug.cgi?id=457064)
Solution
----------
In the interim, should this code be augmented with a non-utf-8 pattern
match, so that at least standard ascii hashtags will work?
/* extract all #hastags */
$count = preg_match_all('/(?:^|\s)#([\pL\pN_\-\.]{1,64})/',
strtolower($this->content), $match);
if (!$count) {
$count_without_utf8 =
preg_match_all('/(?:^|\s)#([a-z0-9_\-\.]{1,64})/',
strtolower($this->content), $match);
if (!$count_without_utf8) {
return true;
}
}
Comments? :)
D
_______________________________________________
StatusNet-dev mailing list
StatusNet-dev@lists.status.net
http://lists.status.net/mailman/listinfo/statusnet-dev