On Nov 10, 2009, at 3:36 PM, Alex Tweedly wrote:


I'm using a simple file-based CMS in on-rev (along the lines of Andre's CMS/blog system).

I'm looking for a simple way to allow users to include simple html within the files which will be included as part of the web pages. I need to protect from any accidental damage to the rest of the page (e.g. if they add <div>s or mis-matched or incorrect <table>, etc.) , and some users will not be familiar with html. Really all I need is <br>, <p>, simple <li>, and URL/mail addresses.

I guess I could build a html parser that disallowed any 'unsafe' html - but that doesn't sounds trivial (and it assumes I can think of all the cases where html could cause a problem).

Or I could come up with some format of my own to do it.

Or I could build a (small subset of?) reStructuredText to make it easier to use, and then translate this to html for the output.

Has anyone built this already, in a form they could share ?
Or anything equivalent (and hopefully simpler :-)

(btw - http://docutils.sourceforge.net/rst.html )


In parsing html, sanity is an optional and infrequent result.
If the journey is to begin, then I would recommend the following as a starting point.

Allowing user html has several challenges, and I would start with these few basics:

   put fld "incomingBlock" into userHtml
   -- line endings are ignored, so...
   replace cr with empty in userHtml
   replace numtochar(10) with empty in userHtml
   replace tab with empty in userHtml
   --reset line content by...
   replace "<" with (cr & "<") in userHtml
   replace ">" with (">" & cr) in userHtml
   --now each line either begins with  "<", "</", or a string

   --polish the line endings
   set the itemDel to ">"  --each tag line should contain one item

   repeat with each line LNN in userHtml
      get word 1 to -1 of LNN --trim white space
      if char 1 of IT is not "<" then
         --string of data between tags
         --disregard special chars like umlauts
         -- or convert them using entities  &amp:
         put IT & cr after cleanerHtml
      else
         if char -1 of IT is not ">" then
            -- oops --opening "<" not closed
            breakpoint --fix
         else
            if the number of items in IT > 1 then
               -- oops --too many  ">"
               --this is a tag line, not a data line
               --something not opened properly
               breakpoint --fix
            end if
            put space before char -1 of IT
if char -3 of IT is "/" then put " /" into char -3 to -2 of IT
            -- now endings are either " >"  or " />"
            --thus last word is closing tag
-- valid "<BR> <BR > <BR something> <BR char string > <BR />
            -- invalid  "< BR>   < BR >
if word 1 of IT is "<" then put word 1 of IT & word 2 of IT into word 1 to 2 of IT -- now "<tag anything >" or "<tag anything />" should be true
         end if
         --since this is a tag line, change quotes to apostrophes
         --in html, both are considered valid quote chars
-- and now the data lines can contain user quote chars without
         --  interfering with rev commands.. thus
         replace quote with "'" in IT -- apostrophe works the same
         put IT & cr after cleanerHtml

      end repeat
      filter cleanerHtml without empt

------------------------------------
--now there are 5 types of lines remaining
--1 data        Click here for more videos
--2 mono tags     <BR >  <p >  <hr >
--3 mono with Atrb   <img src='refrStr' height='30' width='244' />

--4 bookends no Atrb <title> (then a data line) and later a line of </title >

--5 bookend with Atrb   <table cellpadding='2' bgcolor='green' >
--       (then more open and close tags    TR  TH  TD  )
--        (then a data line)  (then more open and closing tags)
--     and later a line of </table >

--also   <a href='#anchorOnThisPage' >Click to go down further</a >

--also <a href='httpAnotherPage.com/path/page.html' >Click to go over here</a >

And now the fun begins. Nested tags that need to be balanced, especially clickable lists
          here is a block that functions as simple horizontal menu
<div id='specialCase' class='formatBold' >
<ul id='specialBehaviour' >
<li display='inline' ><a href='httpJumpToPumpkins' class='underline4' >Carving pumpkins</a ></li > <li display='inline' ><a href='httpJumpToCats' class='underline4' >Find a black cat</a ></li > <li display='inline' ><a href='httpJumpToBrooms' class='underline4' >Scary witch's brooms</a ></li >
</ul >
</div >
--display inline is to make a list into one horizontal menu
-- id and class allow CSS to work, so you would want to filter this out to keep it simple

Tables and lists are the most common multi-nested forms that make parsing difficult. Disallowing 'Table', and then attributes like ( id='string' class='string' style='font-color:red' ) will help simplify the core html you will allow.

Hope this helps you get started.
Note I am making this a quick reply and none of the above code has been tested, so there could be some errors. I am just typing off the top of my head.

Jim Ault
Las Vegas




_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to