date:20110609

Re: Retrieving Objects question

2011-06-09 Thread Michael Rubin


Thanks a lot for your reply Andreas. Yes if all I had to do was move 
references around then my work would already be complete and submitted for 
review. However, that catch is that the Page objects also have Parent 
references which also need to be updated when they get moved from one page tree 
node to another. But since they have been written out already this cannot be 
done. So the pages effectively become immovable (or else the parent references 
will not match the kids references as they will be out of date - which was why 
acroread could not open the pages).

Delaying writing the page objects would mean the parent references can be 
updated correctly, and the problem would be solved. But, that has a potential 
memory usage toll.

Today I will continue with my attempt to link every page to a node of its own 
(stored in a flat list), then re-order the nodes according to the page index of 
the page inside. Then build up the balanced page tree from those nodes up. 
That's the plan anyway... (I'll also be interested time permitting in looking 
more closely at what happened when the 2 page sequences ended up with mixed up 
pages...)

Thanks!

-Mike


On 08/06/11 20:14, Andreas L. Delmelle wrote:

On 08 Jun 2011, at 17:15, Michael Rubin wrote:

Hi Mike


Hello there. Thought I'd post an update. Admittedly I feel like I've found a 
bit of a catch 22 situation. I successfully completed my code to generate the 
balanced page tree on the fly and it works fine with a single page sequence. 
However, this morning I discovered that this code does not appear to work for 
multiple page sequences in a flow. (2x 101 page sequences, I got pages 1-9, 
102, 10-101 then 103-end in that order...) I guess this is where pages can come 
in in a different order anyway then, and why the current indexing / nulls 
system is there.

Ouch! I had not considered that to be the purpose. Without looking closer, I 
would say something like: page 10 contains a forward reference to page 102, and 
all pages in between are only flushed after the reference can be been resolved 
(?)


(And shows that I am still learning the ropes as I go along...)

Yep, and also shows that I am not intimately familiar with *all* of the 
codebase myself. ;-)


So I re-examined trying to generate the page tree after the pages have been 
added into one big flat list. I can do this by, in PDFDocument.outputTrailer(), 
calling a method to balance the page tree before all the remaining objects are 
written out. This way pages can be attached to nodes, and the tree hierarchy 
built up to the root node. This is on paper a more elegant, efficient and 
easier solution to doing it on the fly. But I ran into the same problem again - 
the page objects are already written out.

OK, here may be a gap in my understanding of it so far, but...
Do you really _need_ the PDFPage object for some reason, or does its PDF 
reference suffice to build the page tree?
 From what I know of PDF, that page tree would only contain the references to 
the actual page objects, no? As long as the PDFPages object is not written to 
the stream, you should be able to shuffle and play with the references all you 
want. All you need to keep track of, is to retain the natural order (= the 
page's index), as the object numbers will not necessarily reflect that.
Unless I am mistaken about this, I do not see a compelling reason *not* to 
write the PDFPage object to the stream as soon as it's finished. We keep a 
mapping of reference-to-index alive in the 'main' (temporary?) PDFPages object.
Note that notifyKidRegistered() only stores the reference; the natural index is 
translated into the position of the reference in the list. If you want to 
re-shape that into a structured tree/map, then by all means...

Perhaps there is still a catch --sounds too simple somehow... :-/


snip /
My current questions are:

-Why are the page objects flushed straight away? (Memory constraints?)

Very likely to save memory indeed. More with the intention of just flushing as soon 
as possible, to support full streaming processing if the document structure allows 
it. Theoretically, in a document consisting of single-page fo:page-sequences, without any 
cross-references, you should see relatively low memory usage even if the document is 
1+ pages, precisely because the pages are all written to the output immediately, long 
before the root page tree, which only retains their object references.


-Is it safe and wise to delay flushing the page objects until the end?

Safe? No issue here.
Wise? That would obviously depend on the context.
In documents with 1000s of pages, I can imagine we do not want to keep all of 
those pages in memory any longer than strictly necessary... I wouldn't mind too 
much if it were an option that users could switch on/off. However, if the 
process is hard coded as the *only* way FOP will render PDFs, such that it 
would affect *all* users, I am not so sure it is wise to do this.

snip /

Re: Apache Fop + Russian Language Support

2011-06-09 Thread Chris Bowditch


On 07/06/2011 09:49, Ankur wrote:

I am Ankur,need to generate Report in Russian Language.


Hi Ankur,


fop.xconf as:
font  metrics-url=/root/Desktop/lucon_ankur.xml
 kerning=yes embed -
url=/root/Desktop/lucon.ttf
font-triplet name=LUCON style=normal weight=normal/
/font

XSL as:
fo:block font-family=LUCON
Russian:


А Б В Г
/fo:block
but still the PostScript file is not showing these Russian characters.


That is because fop v1.0 and trunk do not support True Type Fonts (TTF) 
in Postscript. There is a branch in SVN where support for TTF in 
Postscript has been added. You will need to download the branch from SVN 
and compile it yourself if you can't find a Type 1 version of the Lucon 
font.



  Can someone
help me in this?
BTW: In future, please ask questions about FOP's usage on the fop-user 
mailing list. fop-dev is reserved for discussions about FOP's internals.


Chris

Re: Retrieving Objects question

2011-06-09 Thread Andreas L. Delmelle

On 09 Jun 2011, at 09:49, Michael Rubin wrote:

Hi Mike

 Thanks a lot for your reply Andreas. Yes if all I had to do was move 
 references around then my work would already be complete and submitted for 
 review. However, that catch is that the Page objects also have Parent 
 references which also need to be updated when they get moved from one page 
 tree node to another. But since they have been written out already this 
 cannot be done. So the pages effectively become immovable (or else the parent 
 references will not match the kids references as they will be out of date - 
 which was why acroread could not open the pages).

Aah, OK, now I understand that catch better. Thanks for clarifying! 
In the meantime, I'll give it some more thought too, and if I find anything 
useful to add, I'll follow up here.


Regards

Andreas
---

Re: Retrieving Objects question

Re: Apache Fop + Russian Language Support

Re: Retrieving Objects question

3 matches

Site Navigation

Mail list logo

Footer information