Re: [libreoffice-users] Can LO build a TOC from a PDF file?

2017-07-09 Thread gordon cooper
  There is a round-about way of doing this using Nuance's PDF 
Converter, but

I have not used it since I abandoned Windows® several years ago. With the
PDF Converter, one can make a Word file which could be read by LO, then
use LO's Insert ToC tool and export the result back to PDF.

Gordon

Tauranga N.Z.


On 10/07/17 05:20, Gilles wrote:

Hello,

This PDF file

has no Table of Contents, and I was wondering if LO could grab all the
headers and build a TOC.

Thank you.



--
View this message in context: 
http://nabble.documentfoundation.org/Can-LO-build-a-TOC-from-a-PDF-file-tp4217910.html
Sent from the Users mailing list archive at Nabble.com.




--
To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-users] Can LO build a TOC from a PDF file?

2017-07-09 Thread Cley Faye
2017-07-09 23:58 GMT+02:00 Jean-Francois Nifenecker <
jean-francois.nifenec...@laposte.net>:

> Hello Gilles,
>
> Le 09/07/2017 à 19:20, Gilles a écrit :
>
>> Hello,
>>
>> This PDF file
>> > e=LEGITEXT06074228=pdf>
>> has no Table of Contents, and I was wondering if LO could grab all the
>> headers and build a TOC.
>>
>
> In order to create a PDF with a TOC/index you'll have to set heading
> styles to the appropriate paragraphs.
>
> Opening a PDF with LibO won't go anywhere as the tool for that is Draw
> which can't set styles for a text processor.
>
> I can't see a way to do that quickly, I'm afraid: a copy/paste from the
> PDF document to Writer is possible but you'll have to fix a lot of things
> (eg. useless carriage returns) and apply heading styles by hand. On a 400+
> pages document this a big PITA.
>
> Hopefully someone else will come with brighter ideas.
>
>
>
​You want brighter ideas? Say no more!

So... hmm... I'm afraid there won't be many fully-automated tools that can
build a TOC for you. A PDF basically contains a lot of individual elements,
that are arranged to look like ​something coherent.
From the document you linked, it could theoretically be possible to write a
tool that split every pages, grab the raw text, use a regex to find actual
titles, build a TOC, and inject it in the PDF. This would assume:
- Text extraction works correctly (it's not always the case with PDF)
- Titles always follow the same format

But on this kind of document, you could definitely get some acceptable
results. I experimented a bit. The output is here:
http://www.cjoint.com/c/GGjw0OtPkGc
And for the curious, the "script" I used is here:
​https://pastebin.com/icQSZxQr

As you'll see, it is VERY specific to this document, ​but it is possible to
do something.

-- 
To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted


Re: [libreoffice-users] Can LO build a TOC from a PDF file?

2017-07-09 Thread Jean-Francois Nifenecker

Hello Gilles,

Le 09/07/2017 à 19:20, Gilles a écrit :

Hello,

This PDF file

has no Table of Contents, and I was wondering if LO could grab all the
headers and build a TOC.


In order to create a PDF with a TOC/index you'll have to set heading 
styles to the appropriate paragraphs.


Opening a PDF with LibO won't go anywhere as the tool for that is Draw 
which can't set styles for a text processor.


I can't see a way to do that quickly, I'm afraid: a copy/paste from the 
PDF document to Writer is possible but you'll have to fix a lot of 
things (eg. useless carriage returns) and apply heading styles by hand. 
On a 400+ pages document this a big PITA.


Hopefully someone else will come with brighter ideas.


Bien cordialement,
--
Jean-Francois Nifenecker, Bordeaux


--
To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted



[libreoffice-users] Can LO build a TOC from a PDF file?

2017-07-09 Thread Gilles
Hello,

This PDF file

  
has no Table of Contents, and I was wondering if LO could grab all the
headers and build a TOC.

Thank you.



--
View this message in context: 
http://nabble.documentfoundation.org/Can-LO-build-a-TOC-from-a-PDF-file-tp4217910.html
Sent from the Users mailing list archive at Nabble.com.

-- 
To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted