Re: [PHP] Re: Substr by words

2005-10-31 Thread Marcus Bointon


On 31 Oct 2005, at 03:29, Gustavo Narea wrote:

I think It is OK what I said about the caret, but what we need to  
change is the position of \W*:

   Your suggestion: /(\b\w+\b\W*){1,$MaxWords}/
   My suggestion: /^(\W*\b\w+\b){1,$MaxWords}/

We need the *first* ($MaxWords)th words.


I makes no difference - they will both work. Mine doesn't care where  
the first word starts because it doesn't use ^, and yours doesn't  
care where the first word starts because it's got ^ followed by \W*.  
Your overall match will end up with leading spaces, mine will end up  
with trailing spaces - the subsequent trim fixes them both. I like  
mine because it has 1 less char ;^)


Ultimately, if it works for you, great!

Marcus
--
Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Substr by words

2005-10-31 Thread Gustavo Narea

Hello, Marcus.

No, you are right. Your script is better.

I just forgot something I learned about REGEXES: The REGEX engine is 
eager. Thus, in this case, It's not necessary to use the caret. The 
REGEX engine will start from the first word It finds.


I would use yours ;-).

Best regards,

Gustavo Narea.
PHP Documentation - Spanish Translation Team.
Valencia, Venezuela.

Marcus Bointon wrote:


On 31 Oct 2005, at 03:29, Gustavo Narea wrote:

I think It is OK what I said about the caret, but what we need to  
change is the position of \W*:

   Your suggestion: /(\b\w+\b\W*){1,$MaxWords}/
   My suggestion: /^(\W*\b\w+\b){1,$MaxWords}/

We need the *first* ($MaxWords)th words.



I makes no difference - they will both work. Mine doesn't care where  
the first word starts because it doesn't use ^, and yours doesn't  care 
where the first word starts because it's got ^ followed by \W*.  Your 
overall match will end up with leading spaces, mine will end up  with 
trailing spaces - the subsequent trim fixes them both. I like  mine 
because it has 1 less char ;^)


Ultimately, if it works for you, great!

Marcus


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Substr by words

2005-10-30 Thread Marcus Bointon

On 30 Oct 2005, at 06:22, Gustavo Narea wrote:

$replacement = ereg_replace (^([[:space:]]*[^[:space:][:cntrl:]]+) 
{1,$MaxWords}, ,$MyOriginalString);


echo substr( $MyOriginalString, 0, ($replacement) ? -strlen 
($replacement) : strlen($MyOriginalString));


You could get the regex to do the search and the extraction in one go:

$MyOriginalString = This is my original string.\nWhat do you think  
about this script?;

$MaxWords = 6; // How many words are needed?
$matches = array();
if (preg_match(/(\b\w+\b\W*){1,$MaxWords}/, $MyOriginalString,  
$matches)) {

$result = trim($matches[0]);
echo $result;
}

Marcus
--
Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Substr by words

2005-10-30 Thread Gustavo Narea

Hello.

Marcus Bointon wrote:

On 30 Oct 2005, at 06:22, Gustavo Narea wrote:
You could get the regex to do the search and the extraction in one go:

$MyOriginalString = This is my original string.\nWhat do you think  
about this script?;

$MaxWords = 6; // How many words are needed?
$matches = array();
if (preg_match(/(\b\w+\b\W*){1,$MaxWords}/, $MyOriginalString,  
$matches)) {

$result = trim($matches[0]);
echo $result;
}


I have not used preg_* functions yet, so I may be wrong:

I think that trim($matches[0]) will return the whole string with no 
change. On the other hand, I think we have to place a caret after the 
first slash.


What about this:

?php
$MyOriginalString = This is my original string.\nWhat do you think 
about this script?;

$MaxWords = 6; // How many words are needed?
$matches = array();
if (preg_match(/^(\b\w+\b\W*){1,$MaxWords}/, $MyOriginalString, 
$matches)) {

unset($matches[0]);
$result = implode( ,$matches);
echo $result;
}
?

By the way, if you're able to use preg_* functions, I suggest you to use 
this script instead of the former I suggested. What's the difference?


Let's suppose we have a string with typos such as Mandriva , Red Hat , 
Debian (the right one is Mandriva, Red Hat, Debian, without spaces 
before commas). The former script will find 6 words (because of the 
spaces before commas), while the latter will find 4 words (Mandriva Red 
Hat Debian). In this case, the former was wrong and the latter right.


However, the former doesn't not remove punctuation marks nor spaces 
(tabs, fine feeds, among others); the latter will remove any character 
which is a non-word character. If you need words + punctuation marks + 
spaces up to the ($MaxWords)th word, this is my suggestion:


?php
$MyOriginalString = This is my original string.\nWhat do you think 
about this script?;

$MaxWords = 6; // How many words are needed?
$replacement = preg_match(/^(\W*\b\w+\b){1,$MaxWords}/, '', 
$MyOriginalString);
$result = substr( $MyOriginalString, 0, ($replacement) ? 
-strlen($replacement) : strlen($MyOriginalString));


echo $result;
?

Best regards,

Gustavo Narea.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Substr by words

2005-10-30 Thread Gustavo Narea

Other mistake in my last script.

Gustavo Narea wrote:

?php
$MyOriginalString = This is my original string.\nWhat do you think 
about this script?;

$MaxWords = 6; // How many words are needed?
$replacement = preg_match(/^(\W*\b\w+\b){1,$MaxWords}/, '', 
$MyOriginalString);
$result = substr( $MyOriginalString, 0, ($replacement) ? 
-strlen($replacement) : strlen($MyOriginalString));


echo $result;
?


Instead of preg_match(), I had to type preg_replace():

?php
$MyOriginalString = This is my original string.\nWhat do you think 
about this script?;

$MaxWords = 6; // How many words are needed?
$replacement = preg_replace(/^(\W*\b\w+\b){1,$MaxWords}/, '', 
$MyOriginalString);
$result = substr( $MyOriginalString, 0, ($replacement) ? 
-strlen($replacement) : strlen($MyOriginalString));


echo $result;
?

Best regards,

Gustavo Narea.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Substr by words

2005-10-30 Thread Marcus Bointon


On 30 Oct 2005, at 15:35, Gustavo Narea wrote:

I think that trim($matches[0]) will return the whole string with no  
change.


No, it will return the entire matching pattern, not just the sub- 
matches. I added the trim to remove any leading space, and there will  
nearly always be a trailing space because of the part of my pattern  
that defines a word will include it. It was simpler to use trim than  
to make the pattern skip it. Did you actually try it?


On the other hand, I think we have to place a caret after the first  
slash.


Only if you insist that your string must start with a word - putting  
a ^ at the start would make it omit the first word if there was a  
space in front if it.



Instead of preg_match(), I had to type preg_replace():


err. I think you missed the point here. You don't need all that messy  
substr stuff at all. The preg_match already did it.


Marcus
--
Marcus Bointon
Synchromedia Limited: Putting you in the picture
[EMAIL PROTECTED] | http://www.synchromedia.co.uk

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Substr by words

2005-10-30 Thread Gustavo Narea

Hello, Marcus.

Marcus Bointon wrote:

On 30 Oct 2005, at 15:35, Gustavo Narea wrote:

I think that trim($matches[0]) will return the whole string with no  
change.
No, it will return the entire matching pattern, not just the sub- 
matches. I added the trim to remove any leading space, and there will  
nearly always be a trailing space because of the part of my pattern  
that defines a word will include it. It was simpler to use trim than  to 
make the pattern skip it. Did you actually try it?
No. I said that I was not that sure about this because I have not used 
preg_* functions yet.



On the other hand, I think we have to place a caret after the first  
slash.
Only if you insist that your string must start with a word - putting  a 
^ at the start would make it omit the first word if there was a  space 
in front if it.
I think It is OK what I said about the caret, but what we need to change 
is the position of \W*:

   Your suggestion: /(\b\w+\b\W*){1,$MaxWords}/
   My suggestion: /^(\W*\b\w+\b){1,$MaxWords}/

We need the *first* ($MaxWords)th words.


Instead of preg_match(), I had to type preg_replace():
err. I think you missed the point here. You don't need all that messy  
substr stuff at all. The preg_match already did it.


Sorry, you are right. Maybe I thought I was talking about the former 
script I suggested...


What do you think if we use the script you suggested, but we change the 
regex to what I said above? It will look like:


?php
$MyOriginalString = This is my original string.\nWhat do you think 
about this script?;

$MaxWords = 6; // How many words are needed?
$matches = array();
if (preg_match(/^(\W*\b\w+\b){1,$MaxWords}/, $MyOriginalString, 
$matches)) {

$result = trim($matches[0]);
echo $result;
}
?

Best regards,

Gustavo Narea.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php