Thank's much for those pointers.

Rob


On Fri, Dec 6, 2024 at 11:23 PM Tilman Hausherr <thaush...@t-online.de>
wrote:

> Here's something that comes close:
>
> https://stackoverflow.com/questions/38931422/pdfbox-2-0-2-calling-of-pagedrawer-processpage-method-caught-exceptions
>
> https://stackoverflow.com/questions/55166990/pdfbox-line-rectangle-extraction
>
> it collects lines for later use. You need to alter that code so it also
> collects no rectangular paths.
>
> Tilman
>
> On 07.12.2024 07:58, Rob McDonald wrote:
> > I am considering starting a new project and I'm looking at using PDFBox
> to
> > do it.  I would appreciate any thoughts on the appropriateness of PDFBox
> > vs. other PDF libraries -- Java or C++.
> >
> > My program will be similar in many ways to a vector drawing program
> > (Inkscape, Illustrator, etc.).  I want to be able to parse a page of a
> PDF
> > document and work with the entities in memory.  I am mostly interested in
> > vector graphics paths -- I don't really care about the text or raster
> > images.
> >
> > A user will need to be able to click on a given path to select it.  They
> > should then be able to manipulate that path -- perhaps suppress it from
> > display, change the stroke width, color, re-ordering, etc.  A particular
> > path needs to be uniquely identifiable and manipulated.  The program
> needs
> > to be interactive -- there is not enough information available apriori to
> > process a file or a page in a batch manner.
> >
> >
> >  From what I can tell, PDFBox mostly treats a PDF file as a stream.  It
> > reads a file incrementally, processing as it goes.  Each page is
> processed
> > operator by operator, without storing anything in memory beyond the
> current
> > operator and its operands.  In this way, memory usage is kept very low --
> > even for documents with many pages or very complex pages.
> >
> >  From what I can tell, the existing operator data structures are set up
> to
> > take action (process()  I.e. draw or print, or convert), but are not set
> up
> > for storage -- keep the data around to do something with later.
> >
> >
> > I can imagine constructing data structures to store each operator with
> its
> > operands (will need a concrete class for every possible operator).
> Then, a
> > separate Parser would be needed to go through the Page and store the
> stream
> > of operators into a collection of some sort (vector, array, list, etc.).
> >
> > Then, another pass could be made to consolidate / interpret groups of
> > operators into paths.  I.e. a path starts with a MoveTo consists of a
> bunch
> > of LineTo and CurveTo's and is terminated by a Close, End, Stroke, or
> > whatever.
> >
> >
> > I will want to be able to visualize the manipulated page -- so I'll
> either
> > need to write my own renderer to work from my page data structure, or I
> > will need to be able to re-serialize my data structure back into a PDF
> > stream and then feed the modified page to the main renderer I'm using.
> >
> >
> > Does this kind of capability already exist in PDFBox -- perhaps in one of
> > the examples?  Or possibly in a 3rd party open source project that uses
> > PDFBox?
> >
> > Does this seem like the right approach with PDFBox?  Am I missing an
> > obviously better way?
> >
> > Does anyone know of an alternate library that would be more suitable for
> > these use cases and abstractions?
> >
> >
> > Thanks in advance for any help.  Thanks also for all the work that has
> gone
> > into PDFBox so far.
> >
> > Rob
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>

Reply via email to