[libreoffice-users] Removing Index Markers from Writer: a How-To

2014-01-28 Thread CVAlkan
In previous posts, I described how Writer adds extra index markers when
updating an Alphabetical Index. One side effect of this behavior is that,
even if an item is later removed from the concordance file, the marker
remains in the text, and therefore in the index.

So, here's how to remove all of the index markers from a Writer document so
you can start with a clean slate. To do this, you will need to be running
LibreOffice on some flavor of Linux/Unix, or at least on a system that has a
command line or some text editor with sed or grep capabilities.

1: Make a backup of your Writer document. You know the consequences if
something goes amiss.
2: Open the document in Writer, and choose Save As OpenDocument Text (Flat
XML) (fodt)
   This creates an uncompressed XML version of the document.
   On my system (Ubuntu), I was unable to decompress the odt version, as the
OS complained it was malformed, but using the native capability is always a
better idea.
3: Close the document and exit Writer.
4: Open a command line shell, preferably in the directory containing the
fodt file.
5: Run the following command (all one line - broken apart here for clarity):
   sed 's/text:alphabetical-index-mark
text:string-value=\([A-Za-z]*\)\///g'
Old_File_Name_and_Path.fodt
New_File_Name_and_Path.fodt
   Depending on the file size and processor speed, this may take a bit.
   If this gives errors, you're on your own.
6: Close the command line shell.
7: Open the new cleansed fodt file with Writer.
8: The file should look the same but without any alphabetical index markers.
(Your index formatting is still there, though)
9: Go to where your alphabetical index is located, right click on it and
select Update Index/Table
A: All of the index entries should disappear; if any remain, go find them on
the referenced pages and manually delete them. Apparently, some of the
indexes are embedded in others and aren't found by the sed command above.
   I didn't bother to try figuring out how or why that happened. I had
several hundred markers, of which only five weren't removed.
B: Now, go back to the index and select Edit Index/Table, then File | Open.
C: Select the original concordance file (assuming you have it set up how you
want it), and let Writer go do its thing.
D: You now have a clean document with no duplicate index entries.
E: LOOK AT IT CAREFULLY, of course, before replacing your original. The
document I tried this on was over four hundred pages with lots of tables,
graphics and so forth, and I found no problems, but it's up to you to
determine if everything is ok.

I hope this helps any others who might be using alphabetic indexes.




--
View this message in context: 
http://nabble.documentfoundation.org/Removing-Index-Markers-from-Writer-a-How-To-tp4094327.html
Sent from the Users mailing list archive at Nabble.com.

-- 
To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted



Re: [libreoffice-users] Removing Index Markers from Writer: a How-To

2014-01-28 Thread Peter West
I've asked a question on this list about soffice, and this topic is the 
reason I was asking.


Let's say we can use the command line to 1) convert an .odt to an .fodt 
file, and 2) to convert it back again.


If so, I can write a script that uses soffice to do
1) as above
perl -e  # perform the text substitutions
2) as above to re-establish the modified .odt file.

Peter West
Other seed fell among thorns, and the thorns grew up and choked it...

On 29/01/2014 1:45 am, CVAlkan wrote:

In previous posts, I described how Writer adds extra index markers
when updating an Alphabetical Index. One side effect of this behavior
is that, even if an item is later removed from the concordance file,
the marker remains in the text, and therefore in the index.

So, here's how to remove all of the index markers from a Writer
document so you can start with a clean slate. To do this, you will
need to be running LibreOffice on some flavor of Linux/Unix, or at
least on a system that has a command line or some text editor with
sed or grep capabilities.

1: Make a backup of your Writer document. You know the consequences
if something goes amiss. 2: Open the document in Writer, and choose
Save As OpenDocument Text (Flat XML) (fodt) This creates an
uncompressed XML version of the document. On my system (Ubuntu), I
was unable to decompress the odt version, as the OS complained it was
malformed, but using the native capability is always a better idea.
3: Close the document and exit Writer. 4: Open a command line shell,
preferably in the directory containing the fodt file. 5: Run the
following command (all one line - broken apart here for clarity): sed
's/text:alphabetical-index-mark
text:string-value=\([A-Za-z]*\)\///g' 
Old_File_Name_and_Path.fodt

New_File_Name_and_Path.fodt

Depending on the file size and processor speed, this may take a bit.
If this gives errors, you're on your own. 6: Close the command line
shell. 7: Open the new cleansed fodt file with Writer. 8: The file
should look the same but without any alphabetical index markers.
(Your index formatting is still there, though) 9: Go to where your
alphabetical index is located, right click on it and select Update
Index/Table A: All of the index entries should disappear; if any
remain, go find them on the referenced pages and manually delete
them. Apparently, some of the indexes are embedded in others and
aren't found by the sed command above. I didn't bother to try
figuring out how or why that happened. I had several hundred markers,
of which only five weren't removed. B: Now, go back to the index and
select Edit Index/Table, then File | Open. C: Select the original
concordance file (assuming you have it set up how you want it), and
let Writer go do its thing. D: You now have a clean document with
no duplicate index entries. E: LOOK AT IT CAREFULLY, of course,
before replacing your original. The document I tried this on was over
four hundred pages with lots of tables, graphics and so forth, and I
found no problems, but it's up to you to determine if everything is
ok.

I hope this helps any others who might be using alphabetic indexes.




-- View this message in context:
http://nabble.documentfoundation.org/Removing-Index-Markers-from-Writer-a-How-To-tp4094327.html



Sent from the Users mailing list archive at Nabble.com.




--
Peter West
Other seed fell among thorns, and the thorns grew up and choked it...

--
To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted