RE: [PHP] How do I do count the occurrence of each word?

2012-08-20 Thread Ford, Mike
 -Original Message-
 From: Marco Behnke [mailto:ma...@behnke.biz]
 Sent: 19 August 2012 06:39
 To: php-general@lists.php.net
 Subject: Re: [PHP] How do I do count the occurrence of each word?
 
 Am 19.08.12 06:59, schrieb tamouse mailing lists:
  On Sat, Aug 18, 2012 at 6:44 PM, John Taylor-Johnston
  jt.johns...@usherbrooke.ca wrote:
  I want to parse this text and count the occurrence of each word:
 
  Sample Output:
 
  determined = 4
  fire = 7
  patrol = 3
  theft = 6
  witness = 1
  witnessed = 1
 

[...]

  and then you just run through the words building an associative
 array
  by incrementing the count of each word as the key to the array:
 
  foreach ($words as $word) {
  $freq[$word]++;
  }
 
 Please an existence check to avoid incrementing not set array keys
 
 foreach ($words as $word) {
   if (array_key_exists($word, $freq)) {
 $freq[$word] = 1;
   } else {
 $freq[$word]++;
   }
 }

Erm...

   $freq = array_count_values($words)

(http://php.net/array_count_values)


Cheers!

Mike

-- 
Mike Ford,
Electronic Information Developer, Libraries and Learning Innovation,  
Portland PD507, City Campus, Leeds Metropolitan University,
Portland Way, LEEDS,  LS1 3HE,  United Kingdom 
E: m.f...@leedsmet.ac.uk T: +44 113 812 4730





To view the terms under which this email is distributed, please go to 
http://disclaimer.leedsmet.ac.uk/email.htm


Re: [PHP] How do I do count the occurrence of each word?

2012-08-19 Thread tamouse mailing lists
On Sun, Aug 19, 2012 at 12:38 AM, Marco Behnke ma...@behnke.biz wrote:
 Am 19.08.12 06:59, schrieb tamouse mailing lists:
 On Sat, Aug 18, 2012 at 6:44 PM, John Taylor-Johnston
 jt.johns...@usherbrooke.ca wrote:
 I want to parse this text and count the occurrence of each word:

 $text = http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html;
 #Can I do this?
 $stripping = strip_tags($text); #get rid of html
 $stripping = strtolower($stripping); #put in lowercase

 
 First of all I want to start AFTER the expression News Releases and stop
 BEFORE the next occurrence of -30-

 #This may occur an undetermined number of times on
 http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html


 
 Second, do I put $stripping into an array to separate each word by each
 space  ?

 $stripping = implode( , $stripping);

 
 Third how do I count the number of occurrences of each word?

 Sample Output:

 determined = 4
 fire = 7
 patrol = 3
 theft = 6
 witness = 1
 witnessed = 1

 
 ?php
 $text = http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html
 #echo strip_tags($text);
 #echo \n;
 $stripping = strip_tags($text);

 #Get text between News Releases and stop before the next occurrence of
 -30-

 #$stripping = str_replace(\r,  , $stripping);# getting rid of \r
 #$stripping = str_replace(\n,  , $stripping);# getting rid of \n
 #$stripping = str_replace(  ,  , $stripping);# getting rid of the
 occurrences of double spaces

 #$stripping = strtolower($stripping);

 #Where do I go now?
 ?


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php

 This is usually a first-year CS programming problem (word frequency
 counts) complicated a little bit by needing to extract the text.
 You've started off fine, stripping tags, converting to lower case,
 you'll want to either convert or strip HTML entities as well, deciding
 what you want to do with plurals and words like you're, Charlie's,
 it's, etc, also whether something like RFC822 is a word or not
 (mixed letters and numbers).

 When you've arranged all that, splitting on white space is trivial:

 $words = preg_split('/[[:space:]]+/',$text);

 and then you just run through the words building an associative array
 by incrementing the count of each word as the key to the array:

 foreach ($words as $word) {
 $freq[$word]++;
 }

 Please an existence check to avoid incrementing not set array keys

 foreach ($words as $word) {
   if (array_key_exists($word, $freq)) {
 $freq[$word] = 1;
   } else {
 $freq[$word]++;
   }
 }

Ah, yes, good point -- as written, my code will raise two notices. In
addition, declare the $freq array:

$freq=array();

as well before the foreach loop to ensure notice-free operation.




 For output, you may want to sort the array:

 ksort($freq);



 --
 Marco Behnke
 Dipl. Informatiker (FH), SAE Audio Engineer Diploma
 Zend Certified Engineer PHP 5.3

 Tel.: 0174 / 9722336
 e-Mail: ma...@behnke.biz

 Softwaretechnik Behnke
 Heinrich-Heine-Str. 7D
 21218 Seevetal

 http://www.behnke.biz



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How do I do count the occurrence of each word?

2012-08-18 Thread tamouse mailing lists
On Sat, Aug 18, 2012 at 6:44 PM, John Taylor-Johnston
jt.johns...@usherbrooke.ca wrote:
 I want to parse this text and count the occurrence of each word:

 $text = http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html;
 #Can I do this?
 $stripping = strip_tags($text); #get rid of html
 $stripping = strtolower($stripping); #put in lowercase

 
 First of all I want to start AFTER the expression News Releases and stop
 BEFORE the next occurrence of -30-

 #This may occur an undetermined number of times on
 http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html


 
 Second, do I put $stripping into an array to separate each word by each
 space  ?

 $stripping = implode( , $stripping);

 
 Third how do I count the number of occurrences of each word?

 Sample Output:

 determined = 4
 fire = 7
 patrol = 3
 theft = 6
 witness = 1
 witnessed = 1

 
 ?php
 $text = http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html
 #echo strip_tags($text);
 #echo \n;
 $stripping = strip_tags($text);

 #Get text between News Releases and stop before the next occurrence of
 -30-

 #$stripping = str_replace(\r,  , $stripping);# getting rid of \r
 #$stripping = str_replace(\n,  , $stripping);# getting rid of \n
 #$stripping = str_replace(  ,  , $stripping);# getting rid of the
 occurrences of double spaces

 #$stripping = strtolower($stripping);

 #Where do I go now?
 ?


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


This is usually a first-year CS programming problem (word frequency
counts) complicated a little bit by needing to extract the text.
You've started off fine, stripping tags, converting to lower case,
you'll want to either convert or strip HTML entities as well, deciding
what you want to do with plurals and words like you're, Charlie's,
it's, etc, also whether something like RFC822 is a word or not
(mixed letters and numbers).

When you've arranged all that, splitting on white space is trivial:

$words = preg_split('/[[:space:]]+/',$text);

and then you just run through the words building an associative array
by incrementing the count of each word as the key to the array:

foreach ($words as $word) {
$freq[$word]++;
}

For output, you may want to sort the array:

ksort($freq);

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] How do I do count the occurrence of each word?

2012-08-18 Thread Marco Behnke
Am 19.08.12 06:59, schrieb tamouse mailing lists:
 On Sat, Aug 18, 2012 at 6:44 PM, John Taylor-Johnston
 jt.johns...@usherbrooke.ca wrote:
 I want to parse this text and count the occurrence of each word:

 $text = http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html;
 #Can I do this?
 $stripping = strip_tags($text); #get rid of html
 $stripping = strtolower($stripping); #put in lowercase

 
 First of all I want to start AFTER the expression News Releases and stop
 BEFORE the next occurrence of -30-

 #This may occur an undetermined number of times on
 http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html


 
 Second, do I put $stripping into an array to separate each word by each
 space  ?

 $stripping = implode( , $stripping);

 
 Third how do I count the number of occurrences of each word?

 Sample Output:

 determined = 4
 fire = 7
 patrol = 3
 theft = 6
 witness = 1
 witnessed = 1

 
 ?php
 $text = http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.html
 #echo strip_tags($text);
 #echo \n;
 $stripping = strip_tags($text);

 #Get text between News Releases and stop before the next occurrence of
 -30-

 #$stripping = str_replace(\r,  , $stripping);# getting rid of \r
 #$stripping = str_replace(\n,  , $stripping);# getting rid of \n
 #$stripping = str_replace(  ,  , $stripping);# getting rid of the
 occurrences of double spaces

 #$stripping = strtolower($stripping);

 #Where do I go now?
 ?


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php

 This is usually a first-year CS programming problem (word frequency
 counts) complicated a little bit by needing to extract the text.
 You've started off fine, stripping tags, converting to lower case,
 you'll want to either convert or strip HTML entities as well, deciding
 what you want to do with plurals and words like you're, Charlie's,
 it's, etc, also whether something like RFC822 is a word or not
 (mixed letters and numbers).

 When you've arranged all that, splitting on white space is trivial:

 $words = preg_split('/[[:space:]]+/',$text);

 and then you just run through the words building an associative array
 by incrementing the count of each word as the key to the array:

 foreach ($words as $word) {
 $freq[$word]++;
 }

Please an existence check to avoid incrementing not set array keys

foreach ($words as $word) {
  if (array_key_exists($word, $freq)) {
$freq[$word] = 1;
  } else {
$freq[$word]++;
  }
}



 For output, you may want to sort the array:

 ksort($freq);



-- 
Marco Behnke
Dipl. Informatiker (FH), SAE Audio Engineer Diploma
Zend Certified Engineer PHP 5.3

Tel.: 0174 / 9722336
e-Mail: ma...@behnke.biz

Softwaretechnik Behnke
Heinrich-Heine-Str. 7D
21218 Seevetal

http://www.behnke.biz




signature.asc
Description: OpenPGP digital signature