php-general Digest 2 Sep 2012 16:31:41 -0000 Issue 7947

Topics (messages 318938 through 318945):

Re: extract Occurrences AFTER ... and before "-30-"
        318938 by: Ashley Sheridan
        318939 by: Matijn Woudt
        318940 by: Matijn Woudt
        318941 by: Frank Arensmeier
        318942 by: John Taylor-Johnston
        318943 by: John Taylor-Johnston
        318944 by: Matijn Woudt
        318945 by: John Taylor-Johnston

Administrivia:

To subscribe to the digest, e-mail:
        php-general-digest-subscr...@lists.php.net

To unsubscribe from the digest, e-mail:
        php-general-digest-unsubscr...@lists.php.net

To post to the list, e-mail:
        php-gene...@lists.php.net


----------------------------------------------------------------------
--- Begin Message ---
On Sun, 2012-09-02 at 00:23 -0400, John Taylor-Johnston wrote:

> See:
> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.php
> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.phps
> 
> In $mystring, I need to extract everything between "|News Releases|" and 
> "-30".
> 
> The thing now is $mystring might contain many instances of 
> "|News Releases|" and "-30".
> 
> How do I deal with this? My code only catches the first instance.
> 
> Thanks for you help so far.
> 
> John
> 
> >> You can use strpos() to find the location of "News Releases" then you
> >> can again use strpos() to find the location of "-- 30 --" but you will
> >> want to feed strpos() an offset for matching "-- 30 --" (specifically
> >> the position found for "News Releases"). This ensures that you only
> >> match on "-- 30 --" when it comes after "News Releases". Once you have
> >> your beginning and start offsets you can use substr() to create a
> >> substring of the interesting excerpt. Once you have the excerpt in hand
> >> you can go back to tamouse's recommendation above.
> > Cheers,
> > Rob.
> 
> 


What code are you using at the moment? It's not very useful to us to
know that your code is half-way there, but then not see the code!

-- 
Thanks,
Ash
http://www.ashleysheridan.co.uk



--- End Message ---
--- Begin Message ---
On Sun, Sep 2, 2012 at 6:23 AM, John Taylor-Johnston
<jt.johns...@usherbrooke.ca> wrote:
> See:
> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.php
> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.phps
>
> In $mystring, I need to extract everything between "|News Releases|" and
> "-30".
>
> The thing now is $mystring might contain many instances of "|News Releases|"
> and "-30".
>
> How do I deal with this? My code only catches the first instance.
>
> Thanks for you help so far.
>
> John
>

You could use substr to retrieve the rest of the string and just start
over (do it in a while loop to catch all).
Though, it's probably not really efficient if you have long strings.
You'd be better off with preg_match. You can do it all with a single
line of code, albeit that regex takes quite some time to figure out if
not experienced.

- Matijn

PS. Please don't top post on this and probably any mailing list.

--- End Message ---
--- Begin Message ---
On Sun, Sep 2, 2012 at 10:10 AM, Ashley Sheridan
<a...@ashleysheridan.co.uk> wrote:
> On Sun, 2012-09-02 at 00:23 -0400, John Taylor-Johnston wrote:
>
>> See:
>> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.php
>> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.phps
>>
>> In $mystring, I need to extract everything between "|News Releases|" and
>> "-30".
>>
>> The thing now is $mystring might contain many instances of
>> "|News Releases|" and "-30".
>>
>> How do I deal with this? My code only catches the first instance.
>>
>> Thanks for you help so far.
>>
>> John
>>
>> >> You can use strpos() to find the location of "News Releases" then you
>> >> can again use strpos() to find the location of "-- 30 --" but you will
>> >> want to feed strpos() an offset for matching "-- 30 --" (specifically
>> >> the position found for "News Releases"). This ensures that you only
>> >> match on "-- 30 --" when it comes after "News Releases". Once you have
>> >> your beginning and start offsets you can use substr() to create a
>> >> substring of the interesting excerpt. Once you have the excerpt in hand
>> >> you can go back to tamouse's recommendation above.
>> > Cheers,
>> > Rob.
>>
>>
>
>
> What code are you using at the moment? It's not very useful to us to
> know that your code is half-way there, but then not see the code!
>

Ash.. Might want to read the mail again, the code is there... ;)

--- End Message ---
--- Begin Message ---
2 sep 2012 kl. 14.40 skrev Matijn Woudt:

> On Sun, Sep 2, 2012 at 6:23 AM, John Taylor-Johnston
> <jt.johns...@usherbrooke.ca> wrote:
>> See:
>> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.php
>> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.phps
>> 
>> In $mystring, I need to extract everything between "|News Releases|" and
>> "-30".
>> 
>> The thing now is $mystring might contain many instances of "|News Releases|"
>> and "-30".
>> 
>> How do I deal with this? My code only catches the first instance.
>> 
>> Thanks for you help so far.
>> 
>> John
>> 
> 
> You could use substr to retrieve the rest of the string and just start
> over (do it in a while loop to catch all).
> Though, it's probably not really efficient if you have long strings.
> You'd be better off with preg_match. You can do it all with a single
> line of code, albeit that regex takes quite some time to figure out if
> not experienced.
> 
> - Matijn
> 
> PS. Please don't top post on this and probably any mailing list.
> 
> -- 
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 

My approach would be to split the hole text into smaller chunks (with e.g. 
explode()) and extract the interesting parts with a regular expression. Maybe 
this will give you some ideas:

$chunks = explode("-30-", $mystring);
foreach($chunks as $chunk) {
        preg_match_all("/News Releases\n(.+)/s", $chunk, $matches);
        var_dump($matches[1]);
}

The regex matches all text between "News Releases" and the end of the chunk.

/frank


--- End Message ---
--- Begin Message ---

On Sun, Sep 2, 2012 at 6:23 AM, John Taylor-Johnston
<jt.johns...@usherbrooke.ca> wrote:
See:
http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.php
http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.phps

In $mystring, I need to extract everything between "|News Releases|" and
"-30".

The thing now is $mystring might contain many instances of "|News Releases|"
and "-30".

How do I deal with this? My code only catches the first instance.

Thanks for you help so far.

John

You could use substr to retrieve the rest of the string and just start
over (do it in a while loop to catch all).
Though, it's probably not really efficient if you have long strings.
You'd be better off with preg_match. You can do it all with a single
line of code, albeit that regex takes quite some time to figure out if
not experienced.

- Matijn

PS. Please don't top post on this and probably any mailing list.
Matijn, I'm a habitual top quoter. Horrible :)) But bottom quoting is not intuitive. But the are the rules, so I will be a good poster :))

I will have very, very long strings. It will be a corpus of text, of maybe 1-2 megs of text.

I'm not terribly experienced. How would I "while" loop this?

I am reading preg-match and the examples, but I don't really follow.
http://www.php.net/manual/en/function.preg-match.php

I admit, I don't know what |"/php/i"means.|

|<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}
?> |







--- End Message ---
--- Begin Message ---
Frank Arensmeier wrote:
>>> See:
>>> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.php
>>> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.phps
>>>
>>> In $mystring, I need to extract everything between "|News Releases|" and
>>> "-30".
>>>
>>> My approach would be to split the hole text into smaller chunks (with e.g. explode()) and extract the interesting parts with a regular expression. Maybe this will give you some ideas:
>>>
>>> $chunks = explode("-30-", $mystring);
>>> foreach($chunks as $chunk) {
>>> preg_match_all("/News Releases\n(.+)/s", $chunk, $matches);
>>> var_dump($matches[1]);
>>> }
>>>
>>> The regex matches all text between "News Releases" and the end of the chunk.
>>>
>>> /frank
>>>
>>>
I could live with that, I think. Here is the output:
http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test2.php

Here are the newbie questions

Why is there more than one array?

array(1)

What are string(190) and string string(247)? Why are they named like that?
http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test2.php
Please explain the / (.+)/s in "/News Releases\n(.+)/s"?

My question is:
Is one array not better? (My next step will be to parse the array to find the frequency of each word ... an array.)


array {
[0]=> "Residential Fire Determined to be Accidental in Nature

(SUMMER BROOK, ON) – May 31, 2012 – The Police Department has determined the cause of a residential fire to be accidental in nature.

"
[1]=>
"Residential Fire Determined to be Accidental in Nature

(SUMMER BROOK, ON) – June 3rd, 2012 – The Police Department has arrested two suspects in the case of a residential fire on May 31st. It is now believe the fire was not accidental in nature.

"
}










--- End Message ---
--- Begin Message ---
On Sun, Sep 2, 2012 at 4:36 PM, John Taylor-Johnston
<jt.johns...@usherbrooke.ca> wrote:
>
>> On Sun, Sep 2, 2012 at 6:23 AM, John Taylor-Johnston
>> <jt.johns...@usherbrooke.ca> wrote:
>>>
>>> See:
>>> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.php
>>> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.phps
>>>
>>> In $mystring, I need to extract everything between "|News Releases|" and
>>> "-30".
>>>
>>> The thing now is $mystring might contain many instances of "|News
>>> Releases|"
>>> and "-30".
>>>
>>> How do I deal with this? My code only catches the first instance.
>>>
>>> Thanks for you help so far.
>>>
>>> John
>>>
>> You could use substr to retrieve the rest of the string and just start
>> over (do it in a while loop to catch all).
>> Though, it's probably not really efficient if you have long strings.
>> You'd be better off with preg_match. You can do it all with a single
>> line of code, albeit that regex takes quite some time to figure out if
>> not experienced.
>>
>> - Matijn
>>
>> PS. Please don't top post on this and probably any mailing list.
>
> Matijn, I'm a habitual top quoter. Horrible :)) But bottom quoting is not
> intuitive. But the are the rules, so I will be a good poster :))

I do find it intuitive actually, when reading things back your answer
is after the question, which makes sense. The other way around
doesn't?

>
> I will have very, very long strings. It will be a corpus of text, of maybe
> 1-2 megs of text.
>
> I'm not terribly experienced. How would I "while" loop this?
>
> I am reading preg-match and the examples, but I don't really follow.
> http://www.php.net/manual/en/function.preg-match.php
>
> I admit, I don't know what |"/php/i"means.|
>

Well, it finds any form of the word php in a text, the i means it can
also be pHp or PHP, etc. It's not that useful in that way. But that
brings me to Frank's example, which is in the right direction.

On Sun, Sep 2, 2012 at 4:33 PM, Frank Arensmeier <farensme...@gmail.com> wrote:
> My approach would be to split the hole text into smaller chunks (with e.g. 
> explode()) and extract the interesting parts with a regular expression. Maybe 
> this will give you some ideas:
>
> $chunks = explode("-30-", $mystring);
> foreach($chunks as $chunk) {
>         preg_match_all("/News Releases\n(.+)/s", $chunk, $matches);
>         var_dump($matches[1]);
> }
>
> The regex matches all text between "News Releases" and the end of the chunk.

It shouldn't be needed to explode the string first, you could do that
with a single preg_match_all. (Sorry, can't remember how anymore, it's
been a while since I last used PCRE ).

On Sun, Sep 2, 2012 at 4:52 PM, John Taylor-Johnston
<jt.johns...@usherbrooke.ca> wrote:
> I could live with that, I think. Here is the output:
> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test2.php
>
> Here are the newbie questions
>
> Why is there more than one array?

One for each preg_match_all in the loop.
>
> array(1)
>
> What are string(190) and string string(247)? Why are they named like that?

string(190), array(1). That's just var_dump. You won't see them if you
used echo or print_r etc. Have a look at the var_dump manual page to
learn more.

> http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test2.php
> Please explain the / (.+)/s in "/News Releases\n(.+)/s"?

The explode will split each block at "-30-", now we have blocks that
end just before the "-30-" sign. (.+) means match all until.. the end.
/s means that the (.+)  also includes newlines.
>
> My question is:
> Is one array not better? (My next step will be to parse the array to find
> the frequency of each word ... an array.)
>

Sure, you just need to figure out the PCRE. Find out more about PCRE
syntax on google (it's pretty much the same as for Perl and other
languages) and PHP manaul.

- Matijn

--- End Message ---
--- Begin Message ---

See:
http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.php
http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test.phps

In $mystring, I need to extract everything between "|News Releases|" and
"-30".

The thing now is $mystring might contain many instances of "|News
Releases|"
and "-30".

How do I deal with this? My code only catches the first instance.

Thanks for you help so far.

John

I do find it intuitive actually, when reading things back your answer is after the question, which makes sense. The other way around doesn't?
I'll never get it. Newest work on top of the pile, instead of digging :))
On Sun, Sep 2, 2012 at 4:33 PM, Frank Arensmeier <farensme...@gmail.com> wrote:
My approach would be to split the hole text into smaller chunks (with e.g. 
explode()) and extract the interesting parts with a regular expression. Maybe 
this will give you some ideas:
$chunks = explode("-30-", $mystring);
foreach($chunks as $chunk) {
         preg_match_all("/News Releases\n(.+)/s", $chunk, $matches);
         var_dump($matches[1]);
}
The regex matches all text between "News Releases" and the end of the chunk.
It shouldn't be needed to explode the string first, you could do that
with a single preg_match_all. (Sorry, can't remember how anymore, it's
been a while since I last used PCRE ).
I'll get the PCRE eventually. Never had much time for PERL after PHP stated anyhow :)p


I think the split by -30- works nicely.


2) How could I suck it into one nice easy to handle array? http://www.cegepsherbrooke.qc.ca/~languesmodernes/test/test2.php

|$mynewarray=|array {
  [0]=> "Residential Fire Determined to be Accidental in Nature ..."
  [1]=> "Arrest Made in Residential Fire ..."
}


Thanks,
John

--- End Message ---

Reply via email to