[PHP] heavy parsing of text, storing both versions

2004-02-19 Thread Justin French
Hi all,

I'm building a CMS that does heavy parsing of a HTML shorthand plain 
text to XHTML strict, in a similar way to Textile 
http://www.textism.com/tools/textile/.

The problem is this conversion might take place on 2-3 columns of text, 
and unlimited other fields (my CMS has user-defined data models), and 
since they'll need to edit this text at a later date, I either need to:

1. Parse the text on demand into HTML -- the parsing script is to 
heavy/slow for this.

2. Store both the plain (shorthand HTML) text and parsed XHTML versions 
of each field -- the problem with this being that i'm storing double 
the data in the database... combine this with versioning of each 
'page', and I'm going to be storing a LOT of data in the DB.

100 articles x 3 versions each x 500 words x 6 chars per word = 900,000 
chars;
add a whole bunch of XHTML to this, and it's looking pretty huge.  
Double the articles or versions, and it's scary :)

It also means I need to have two fields for each field (input and 
parsed), which makes the MySQL tables a lot more complex, etc.

3. write a reverse set of functions which converts the XHTML back to 
the shorthand on demand for editing -- this seems great, but I don't 
like the idea of maintaining two functions for such a beast.

Has anyone got any further ideas?

---
Justin French
http://indent.com.au
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] heavy parsing of text, storing both versions

2004-02-19 Thread John W. Holmes
Justin French wrote:
Hi all,

I'm building a CMS that does heavy parsing of a HTML shorthand plain 
text to XHTML strict, in a similar way to Textile 
http://www.textism.com/tools/textile/.

1. Parse the text on demand into HTML -- the parsing script is to 
heavy/slow for this.

2. Store both the plain (shorthand HTML) text and parsed XHTML versions 
of each field -- the problem with this being that i'm storing double the 
data in the database... 

3. write a reverse set of functions which converts the XHTML back to the 
shorthand on demand for editing -- this seems great, but I don't like 
the idea of maintaining two functions for such a beast.
Well, you pretty much listed all of the options. Personally, I'd 
probably go with #2 because hard drive space is cheap. But... if the 
process is really that intensive and you're really that concerned about 
space, then I'd do #3. It doesn't seem like it'd be that hard to 
maintain as you're just reversing everything and how often do you expect 
it to change? Sorry I can't offer a better option. :)

--
---John Holmes...
Amazon Wishlist: www.amazon.com/o/registry/3BEXC84AB3A5E/

php|architect: The Magazine for PHP Professionals  www.phparch.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] heavy parsing of text, storing both versions

2004-02-19 Thread joel boonstra
On Fri, Feb 20, 2004 at 10:35:11AM +1100, Justin French wrote:
 1. Parse the text on demand into HTML -- the parsing script is to 
 heavy/slow for this.
 
 2. Store both the plain (shorthand HTML) text and parsed XHTML versions 
 of each field -- the problem with this being that i'm storing double 
 the data in the database... combine this with versioning of each 
 'page', and I'm going to be storing a LOT of data in the DB.
snip
 3. write a reverse set of functions which converts the XHTML back to 
 the shorthand on demand for editing -- this seems great, but I don't 
 like the idea of maintaining two functions for such a beast.
 
 
 Has anyone got any further ideas?

4. Store the plain (shorthand HTML) text and when users 'save' changes,
generate a static page containing the transformed XHTML version.  You
will have the processing overhead once (when data is changed), and
everytime else visitors get static files.

It sounds like #3 would be quite difficult.  Going from HTML-XHTML you
know what the end result would look like.  Going the other way, you
won't know for sure what the users originally entered when they authored
the content.  I'm assuming this isn't a 1-to-1 transformation, so that
these:

  bSome bold text/b
  BSome bold text/B
  bSome bold text/B

will all get turned into:

  strongSome bold text/strong

If you turn the strong text back into b, then it's not clear which
of the three options you should use.

Unless I'm misunderstanding...

joel

-- 
[ joel boonstra | gospelcom.net ]

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] heavy parsing of text, storing both versions

2004-02-19 Thread John W. Holmes
joel boonstra wrote:

On Fri, Feb 20, 2004 at 10:35:11AM +1100, Justin French wrote:

1. Parse the text on demand into HTML -- the parsing script is to 
heavy/slow for this.

2. Store both the plain (shorthand HTML) text and parsed XHTML versions 
of each field -- the problem with this being that i'm storing double 
the data in the database... combine this with versioning of each 
'page', and I'm going to be storing a LOT of data in the DB.
snip

3. write a reverse set of functions which converts the XHTML back to 
the shorthand on demand for editing -- this seems great, but I don't 
like the idea of maintaining two functions for such a beast.

Has anyone got any further ideas?


4. Store the plain (shorthand HTML) text and when users 'save' changes,
generate a static page containing the transformed XHTML version.  You
will have the processing overhead once (when data is changed), and
everytime else visitors get static files.
Isn't that just an alternate version of #2? You're still duplicating the 
data and taking up storage space. Again, I wouldn't really be worried 
about this, but that's the issue presented in #2. Sure, static files 
would probably be faster, but that doesn't answer the issue of when/how 
to do the conversion.

--
---John Holmes...
Amazon Wishlist: www.amazon.com/o/registry/3BEXC84AB3A5E/

php|architect: The Magazine for PHP Professionals  www.phparch.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] heavy parsing of text, storing both versions

2004-02-19 Thread joel boonstra
On Thu, Feb 19, 2004 at 09:15:35PM -0500, John W. Holmes wrote:
2. Store both the plain (shorthand HTML) text and parsed XHTML
versions 
of each field -- the problem with this being that i'm storing double 
the data in the database... combine this with versioning of each 
'page', and I'm going to be storing a LOT of data in the DB.
snip
 4. Store the plain (shorthand HTML) text and when users 'save' changes,
 generate a static page containing the transformed XHTML version.  You
 will have the processing overhead once (when data is changed), and
 everytime else visitors get static files.
 
 Isn't that just an alternate version of #2? You're still duplicating the 
 data and taking up storage space. Again, I wouldn't really be worried 
 about this, but that's the issue presented in #2. Sure, static files 
 would probably be faster, but that doesn't answer the issue of when/how 
 to do the conversion.

Version #2 involved an identical database structure, or multiple
database fields, or some sort of redundant data storage that mirrors the
HTML database structure.  The bigger problem to me seemed to be the
complexity introduced into the database, not the extra storage space
required.

This solves the script run problem (only runs once), and lets the
database remain as originally planned.

The point is that the XHTML version is only necessary for display on the
finished webpage, and the simple HTML version is only necessary for
editing in the administrative interface.  Publishing static XHTML files
eliminates the need to do database interactivity on each page request
(after all, the content isn't going to change with each request, is it?)
and keeping the HTML in the database lets the admin. interface be as
interactive and dynamic as is necessary.

Just my $.02, though -- I'm not going to have to end up maintaining
this, so the best answer is the one that works the best for the OP.

joel

-- 
[ joel boonstra | gospelcom.net ]

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php