Re: [PHP] getting content exceprts from the database

Phpster Mon, 26 Apr 2010 06:19:38 -0700

On Apr 26, 2010, at 7:54 AM, Ashley Sheridan<[email protected]> wrote:

On Mon, 2010-04-26 at 07:58 -0400, Phpster wrote:



On Apr 26, 2010, at 7:23 AM, Ashley Sheridan
<[email protected]> wrote:

> On Mon, 2010-04-26 at 13:20 +0200, Peter Lind wrote:
>
>> On 26 April 2010 12:52, Ashley Sheridan <[email protected]>
>> wrote:
>>> I've been thinking about this problem for a little while, and the
>>> thing
>>> is, I can think of ways of doing it, but they're not very nice,
>>> and I
>>> don't think they're going to be fast.
>>>
>>> Basically, I have a load of HTML formatted content in a database
>>> that
>>> get displayed onto the site. It's part of a rudimentary CMS.
>>>
>>> Currently, the titles for each article are displayed on a page,
>>> and each

>>> title links to the full article. However, that leaves me with apage

>>> which is essentially a list of links, and that's not ideal for
>>> SEO. What

>>> I wanted to do to enhance the page is to have a short excerptof x

>>> number of words/characters beneath each article title. The idea
>>> being

>>> that search engines will find the page as more than a linkfarm, and>>> visitors won't have to just rely on the title alone for thecontent.

>>>
>>> Here's the rub though. As the content is in HTML form, I can't
>>> just grab
>>> the first 100 characters and display them as that could leave an
>>> open

>>> tag without a closing one, potentially breaking the page. Icould

>>> use
>>> strip_tags on the 100-character excerpt, but what if the excerpt
>>> itself

>>> broke a tag in half (i.e. <acronym title="something"> couldbecome

>>> <acron )
>>>
>>> The only solutions I can see are:
>>>
>>>
>>>     * retrieve the entire article, perform a strip_tags and then
>>> take
>>>       the excerpt
>>>     * use a regex inside of mysql to pull out only the text
>>>
>>>

>>> The thing is, neither of these seems particularly pretty, and Iam

>>> sure
>>> there's a better way, but it's too early in the week for my brain
>>> to be
>>> fully functional I think!
>>>
>>> Does anyone have any ideas about what I could do, or do you think
>>> I'm
>>> seeing problems where there are none?
>>

>> Use htmltidy or htmlpurifier to clean up things. I.e. grab theamount

>> of content you want, then use one of the tools to repair and clean
>> the
>> html.
>>
>> Regards
>> Peter
>>
>> --
>> <hype>
>> WWW: http://plphp.dk / http://plind.dk
>> LinkedIn: http://www.linkedin.com/in/plind
>> Flickr: http://www.flickr.com/photos/fake51
>> BeWelcome: Fake51
>> Couchsurfing: Fake51
>> </hype>
>>
>
>

> Would that work on content that stopped mid-tag? Assuming theoriginal

> copy is:
>
> <p>This is some sentence, with an <abbr title="Abbreviation">abbr</
> abbr>
> in the middle of it.</p>
>
> If I was asking for only the first 50 characters, I'd get this:
>
> <p>This is some sentence, with an <abbr title="Abb
>

> Would either htmltidy or htmlpurifier be able to handle that? Idon't

> mind whether it tries to repair the tag or remove it completely, as
> long
> as it does something to it.
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>
>

When looking at the performance side of things, couldn't you add
another column to the table and do this work to tidy / strip tags
during the insert going forward?

Any current data would need a one time script to clean / tidy the
current data. you could run this on a nightly cron ( depending on how
much data there is) until the new column is filled with clean data.

Bastien

Sent from my iPod

That's not a bad idea actually, I hadn't thought of it! I'm kickingmyself now, because it's such an obvious solution!


Thanks,
Ash
http://www.ashleysheridan.co.uk


I always prefer simple solutions! It keeps things easy!

Bastien

Sent from my iPod

Re: [PHP] getting content exceprts from the database

Reply via email to