> I was plucking a site which had been formatted in ms word > which kept giving the error in the subject. In order to > get the pluck to work I had to download the site to my > machine and run it through this sed script: > s/<![^>]*>// > (Actually I had to do it twice because I'm not the world's > leading expert in sed)
Try putting a g at the end.[0] That should make the search and replace global (more than once per line). > The problem is msword embeds loads of pseudo directives in > the form of (something like) <![if ...]>....<![endif]> > and Plucker is choking on these. Hmm... I've tried searching the web to determine whether those are legal or not. As best I can tell, they're not. The only thing that's allowed after a "<![" are IGNORE, INCLUDE, and CDATA. So that would make it an MSWord problem, and one of the uglier ones I've seen. *shudder* > Is this an MSword problem or a Plucker problem? Either way I > suspect it will have to be fixed in Plucker 8^(. I suspect that it won't be fixed in Plucker, and you'll have to run a post-spidering command to fix the HTML before you parse it. You might want to look into Demoronizer [1], a tool which will make MS-HTML into something more like HTML. Later, Blake. [0] Disclaimer: I'm not the world's leading expert in sed either. [1] http://www.fourmilab.ch/webtools/demoroniser/ _______________________________________________ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
