Can you just save the HTML as an attachment? -Mike
On Nov 16, 2010, at 7:46 PM, Matthew Woodward <[email protected]>
wrote:
I have a SQL Server database I'm converting to CouchDB. I wrote a
script
that pulls data form SQL Server, denormalizes into the JSON
structure I want
to have in CouchDB, and then makes an HTTP call to insert each record.
The issue I'm running into is some of the fields in SQL Server have
HTML in
them, and I'm not sure the best way to handle this. It's easy enough
to
brute-force escape all the HTML tags, quotes, etc. but then the data
won't
display as HTML when I want to display it on a web page.
The more I thought about it, I think the only character that will
really
cause issues with the JSON (other than high-ASCII MS Word junk,
which I'm
stripping out/converting) is ". Here's an example:
myJSONString = '{"foo":"bar", "baz":"<a href="http://foo.com">Here's a
hyperlink</a>"}';
So if I don't do any sort of escaping the first " in the href will
cause the
JSON to be invalid. But whereas I can change " to " in regular
text
(e.g. "Here's a quote" can become "Here's a quote" and
still be
fine), that won't really work with hyperlinks since <a href="
http://foo.com"> won't work properly I don't believe, or at any
rate
isn't all that desirable.
If I escape the " with \ (using the example above, this becomes <a
href=\"
http://foo.com\">) that lets me get the document into Couch
successfully,
but I have a feeling that isn't going to work if I just want to
display that
field as HTML on a web page (though I have yet to experiment with
that to
see what happens when it's rendered).
Anyway, I've been going around in circles about how best to handle
this and
since I'm new to Couch, I figured I'd ask to see what the conventional
wisdom/"best practice" is for something like this. I'm happy to go a
different route completely (write JSON to a file first and load
using curl,
whatever works ...) if my current approach isn't the best way of
going about
this.
Thanks!
--
Matthew Woodward
[email protected]
http://blog.mattwoodward.com
identi.ca / Twitter: @mpwoodward
Please do not send me proprietary file formats such as Word,
PowerPoint,
etc. as attachments.
http://www.gnu.org/philosophy/no-word-attachments.html