Re: [O] Orgmode → ODT: Certain chars break export

2015-02-14 Thread Vaidheeswaran
On Saturday 14 February 2015 02:20 PM, Vaidheeswaran wrote: Specifically, in the pdftotext case above, I believe the best action would be to M-x flush-lines that match ^L so that page headers are stripped. I was writing from memory. I should have said this instead: The best action would be

Re: [O] Orgmode → ODT: Certain chars break export

2015-02-14 Thread Vaidheeswaran
On Friday 13 February 2015 04:15 PM, Tory S. Anderson wrote: While we're on the topic of ODT export problems: I was in the process of converting PDF to Text to Org to ODT/DocX and discovered that certain characters seem to break exported odt documents, which fail with a line and col number.

Re: [O] Orgmode → ODT: Certain chars break export

2015-02-13 Thread Rasmus
torys.ander...@gmail.com (Tory S. Anderson) writes: While we're on the topic of ODT export problems: I was in the process of converting PDF to Text to Org to ODT/DocX and discovered that certain characters seem to break exported odt documents, which fail with a line and col number. So far the

Re: [O] Orgmode → ODT: Certain chars break export

2015-02-13 Thread Tory S. Anderson
From a user perspective just stripping the characters seems best to me, but finding out what the characters seems obnoxious. Neither a quick search nor skimming the ODT doc specification[1][2] seem to give any insight into a set of illegal characters. Does elisp have anything similar to Java's

Re: [O] Orgmode → ODT: Certain chars break export

2015-02-13 Thread Rasmus
torys.ander...@gmail.com (Tory S. Anderson) writes: From a user perspective just stripping the characters seems best to me, but finding out what the characters seems obnoxious. But maybe there is a valid way to represent such characters in XML? At the very least entities must be replaced

Re: [O] Orgmode → ODT: Certain chars break export

2015-02-13 Thread Tory S. Anderson
There is a helpful wiki page now that you found XML; it even mentions my specific character.[1] The main source seems to be at the w3.org spec.[2] Rasmus ras...@gmx.us writes: torys.ander...@gmail.com (Tory S. Anderson) writes: From a user perspective just stripping the characters seems best

Re: [O] Orgmode → ODT: Certain chars break export

2015-02-13 Thread Rasmus
torys.ander...@gmail.com (Tory S. Anderson) writes: There is a helpful wiki page now that you found XML; it even mentions my specific character.[1] The main source seems to be at the w3.org spec.[2] I don't understand unicode well enough to propose a solution. For now you could use a