Re: PDFName.getName() returns escaped name?!

2012-04-09 Thread Craig Ringer
On 04/04/12 18:02, Vincent Hennebert wrote:
 Hi Craig,

 Thanks for your extensive study!

 On 03/04/12 10:31, Craig Ringer wrote:
 On 03/04/12 01:16, Vincent Hennebert wrote:

 From a quick look that sounds about right. Are you developing with a 1.7 
 JDK?
 You would have to make your code 1.5-compatible. Also, it would be good if
 you could back your optimizations with profiling data. If code
 safety/readability have to be compromised there has to be a good reason
 for it.
 Yep, I'm on 1.7, but was building for a 1.5 target and source level.
 Careful, because that doesn’t ensure you that your code will run on
 a 1.5 JVM. You may still be using elements of the standard library that
 appeared only in 1.6+.
Thanks - I forgot about that. Sigh. The glacial pace at which people
update JREs and JDKs drives me quite nuts sometimes - but at least fop
has moved to 1.5!

Time to do some JDK archaeology...

--
Craig Ringer


Re: PDFName.getName() returns escaped name?!

2012-04-03 Thread Craig Ringer
On 03/04/12 01:16, Vincent Hennebert wrote:

 From a quick look that sounds about right. Are you developing with a 1.7 JDK?
 You would have to make your code 1.5-compatible. Also, it would be good if
 you could back your optimizations with profiling data. If code
 safety/readability have to be compromised there has to be a good reason
 for it.

Yep, I'm on 1.7, but was building for a 1.5 target and source level.

The posted PDFName is really an example of what I'm thinking of; it's
not something that can be seriously merged without a *lot* of changes
across the rest of the tree to convert String name handling to PDFName,
to check for leading /s being added, to check for use of toString() in
name output to PDF data, etc. It's more a starting point for discussion
and to see what kind of interest there is in working on the pdf lib.

It looks like folks here are pretty receptive to significantly reworking
the PDF lib if there's someone willing to stump up the time. That might
well be me if it turns out porting the whole renderer to PDFBox instead
turns out to be lots harder than reworking the fop pdf lib. I suspect not.


 That said, please do feel free to give it a try if that route appeals to
 you. We would certainly consider a switch if it looks more promising in
 the long term.

I'm wondering how practical it'd be to progressively adopt PDFBox,
actually, rather than doing an abrupt and total switch.

The PDF primitives in PDFBox (COSName, COSNumber, COSDictionary, etc)
are modeled extremely similarlines as those in FOP's PDF library, and
while they won't be drop-in replacements they behave very similarly,
just with different method names and in some cases different datatype
assumptions (PDFName instead of String dictionary keys for example). The
only truly big difference looks to be in the handling of indirect
objects, where FOP uses one class that may be direct or indirect, and
PDFBox uses a dedicated `COSObject' class that wraps other objects for
indirect objects.

If it proves possible to do a largely 1:1 substitution of PDFBox
primitive PDF classes for FOP ones that'd *greatly* simplify the job of
moving to using the PDFBox `PD' model for document output, and would let
the job be done in a couple of distinct and separate chunks. PDFFont,
PDFXObject, etc would build on top of the COS types like COSDictionary,
COSBase and COSObject, rather than on top of PDFObject, PDFDictionary, etc.

It's not a perfect 1:1 mapping; notable imperfect matches where one or
more classes map to one or more different classes include:

  

  FOP PDFBOX
  ---
  java.lang.String,   COSName (for PDF names)
  FOP PDFText java.lang.String (for Unicode text)
  byte[] or a binbuf/binstr class (for binary data)
  ---
  PDFNumber   COSNumber
  COSInteger
  COSFloat
  ---
  ???
  (j.l.Boolean?)  COSBoolean
  ---
  PDFObject   COSBase  (direct)
  COSObject  (wraps a COSBase, indirect)
  
  PDFDocument COSDocument
  [+a new class with fop-only functionality]
  PDDocument
  


Of those, the first already needs a cleanup as noted previously; fop
uses String for at least two distinct and incompatible things (Unicode
text and raw binary data) that must be separated anyway. The second
doesn't look to be a big deal as COSNumber has factory methods to take
care of it, it's pretty transparent. I'm not too worried about boolean
handling either. The bigger ones are the different model for indirect
objects, and the non-trivial work to move PDFDocument to extend/wrap
COSDocument. I haven't looked into this in enough depth to have a
meaningful assessment of how hard either of those would be yet.
 
OTOH, some of the important ones map pretty directly, allowing for the
differences in handling of indirect and direct objects higher up the
inheritance tree:

 FOP PDFBOX
 --
 PDFDictionary   COSDictionary
 PDFArrayCOSArray
 PDFStream   COSStream
 PDFNull COSNull


 If I get time (big if, at the moment) I'll see if I can have a play and
determine what kind of work is involved in doing this.

--
Craig Ringer


Re: Assigning unique resource names

2012-04-02 Thread Craig Ringer
It's all good. I haven't finished the important bit, integrating it into
fop PDF images, yet.
On Apr 3, 2012 2:25 AM, mehdi houshmand med1...@gmail.com wrote:

 Hi Craig,

 My sincerest apologies for not getting round to looking at what you've
 done here. I'll try and take a look in the next few days and give it a
 think, see if there's anything we can do to help.

 Apologies once again,

 Mehdi

 On 28 March 2012 07:39, Craig Ringer ring...@ringerc.id.au wrote:

 Hi

 I've nearly finished work on getting fop-pdf-image to overlay PDFs by
 appending their content streams and merging their resource dictionaries,
 rather than by creating XObject Forms. The problem I have left will be
 more intrusive into the fop codebase than what I've had to do so far, so
 I thought I'd check in before I start working on it.

 The reason I'm adapting fop-pdf-images to support merging PDF images
 into the main PDF content instead of using XObject Forms is that the use
 of lots of PDF XObject Forms seems to cause RIPs and clients to perform
 poorly or run out of memory. The way I propose to do it, fop-pdf-images
 will use an XObject form if the preloader sees a pdf image re-used more
 than a configurable number of times (one by default), and otherwise
 merge it into the main pdf.

 Most of that is done, but there's a problem with ensuring unique
 resource names.

 XObject Form resource dictionaries are their own namespace, so no
 resource name (font, ExtGState, etc) in an XObject Form may conflict
 with a name in the parent page's resource dictionary. If XObject Forms
 are no longer used by fop-pdf-image, that namespace separation goes
 away. I have to merge the image page(s)'s resource dictionaries into
 the resource dictionary of the page they're being overlaid over. In the
 case of fop, that's the global resource dictionary because fop doesn't
 currently write per-page resource dictionaries. There's nothing wrong
 with this beyond potentially making the resource dictionary a bit fat,
 but it means I need a way to guarantee that a name will not conflict
 with any other name assigned by fop.

 For GState dictionary objects that's easy; fop just uses GS+object
 number as the name, so if I follow the same scheme when copying
 resources I'm guaranteed to get a unique name since object numbers are
 unique.

 Unfortunately, fop doesn't do anything so consistent for fonts or most
 other resources, and that's made it nearly impossible for me to
 guarantee that I can use a name without a later part of the XSL-FO
 causing fop to create an object that tries to use the same name. Solving
 this will require some changes to the way fop writes the PDF resources
 dictionary.

 I propose that the PDFResources class should take responsibilty for
 allocating resource names and ensuring they're consistent. Instead of
 asking each resource what its name is, the PDFResources class should
 *assign* it a name. Those names can be minimal and compact - eg Fnn
 for fonts, GSnn for graphics states, etc. nn would be a counter
 maintained by PDFResources. That's the convention followed by most other
 PDF producing software and would make it simple and reliable to inject
 objects not created by fop into the resources dictionary without risk of
 conflicts.

 That'll be important if people want to be able to write extensions that
 add new, custom PDF content; it's not just useful for fop-pdf-images.

 This API change would only affect extensions, services and clients that
 work directly with org.apache.fop.pdf.   and
 org.apache.fop.render.pdf.   classes, and only some of those. Clients
 that use the main fop APIs would be completely unaffected, as would
 clients that use the area tree / IR code, image loader code, or pretty
 much anything except the guts of pdf handling.

 I'll post a proposed patch soon, along with patches for some other
 changes that enable what I'm doing but may be useful for others. A patch
 with the fop-pdf-images merge feature support will follow once I've
 finished it enough that I can do test-runs.

 --
 Craig Ringer





Re: PDFName.getName() returns escaped name?!

2012-03-29 Thread Craig Ringer

On 03/30/2012 05:09 AM, J.Pietschmann wrote:

Am 29.03.2012 01:24, schrieb Craig Ringer:

I'd also like to have getEncodedName() return a byte[] not a
String, since an encoded PDF name isn't actually text data.

Sounds like a reasonable idea.


BTW, is there any reason Fop's PDF library uses java.lang.String when
working with sequences of PDF data bytes?

I'd chalk this up to historical reasons, as usual. Fell free to
provide a patch which cleans this up.

J.Pietschmann


Here's how I'd like to rewrite PDFName; untested code as an example of 
what I'm getting at. This is just a standalone file; a patch that 
incorporates it into the main sources will be a lot more work that I'm 
holding off on until I know folks here agree with the approach.


In any case, after reading more of the PDF library I'm rethinking the 
wisdom of trying to make this change. The change its self is correct, 
but it'll be really hard to safely integrate into the rest of the PDF 
library because of the difficulty of auditing every site to ensure 
nothing breaks. Java likes to call `toString' automatically in places, 
meaning that anywhere that doesn't use the proper PDFWritable output 
methods PDFName inherits will break by producing bad PDF data that might 
be quite hard to spot. I'd start by making PDFName.toString() throw (for 
testing), but that'd only catch issues in code that test paths actually hit.


Given the number of these kinds of issues in fop's pdf library I'm more 
and more inclined to wonder if it should just be replaced with PDFBox. 
It's *full* of text encoding issues, it crams 8-bit binary data into the 
lower 8 bits of Unicode strings, etc. Most of the classes that extend 
basics like PDFDictionary act like the base class isn't public API and 
break if anyone else changes the dictionary in ways they don't expect, 
too; they should have-a PDFDictionary not be-a PDFDictionary really.


PDFBox is far from perfect, but it has a clean separation between the 
model classes (PD) and the basic PDF data types (COSxxx); it has a 
clean PDFName, PDFString, etc; it has a good PDF parser already, etc. 
Maybe it'd be easier for me to whip up a port of FOP's PDF output code 
to PDFBox? I suspect I'm insane to mention the possibility of doing that 
without evaluating the amount of work involved first, so I'm not 
promising anything, but by the looks it might be easier than doing the 
cleanups I'd like to do in fop.


Thoughts?

--
Craig Ringer
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/* $Id$ */

package org.apache.fop.pdf;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.Serializable;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
import java.util.*;
import org.apache.commons.io.output.CountingOutputStream;

/**
 * Class representing a PDF name object.
 * 
 */
public class PDFName extends PDFObject {


private static final MapByteString, PDFName commonNames;

private final ByteString unescapedName;
private ByteString escapedName;

/**
 * Creates a new PDF name object from a Unicode java string,
 * encoding the name as UTF-8.
 * 
 * @param name the name value
 */
public PDFName(String name) {
super();
this.unescapedName = new ByteString(name.getBytes(java.nio.charset.StandardCharsets.UTF_8));
}

/**
 * Creates a new PDF name object from a sequence of bytes
 * in no particular encoding.
 * 
 * By PDF convention you should use utf-8 when encoding names
 * (as is done by the String-based PDFName constructor), but this
 * is NOT required by the spec.
 */
private PDFName(ByteString name) {
super();
this.unescapedName = name;
}

/**
 * Create a PDFName with a pre-escaped name supplied. This is mostly useful
 * when defining names from data parsed from PDF data, or when allocating
 * pre-cached names.
 * 
 * @param unescapedName Name with PDF name escapes decoded
 * @param escapedName Name encoded with PDF escapes
 */
private PDFName(ByteString unescapedName, ByteString escapedName) {
this.unescapedName

Assigning unique resource names

2012-03-28 Thread Craig Ringer
Hi

I've nearly finished work on getting fop-pdf-image to overlay PDFs by
appending their content streams and merging their resource dictionaries,
rather than by creating XObject Forms. The problem I have left will be
more intrusive into the fop codebase than what I've had to do so far, so
I thought I'd check in before I start working on it.

The reason I'm adapting fop-pdf-images to support merging PDF images
into the main PDF content instead of using XObject Forms is that the use
of lots of PDF XObject Forms seems to cause RIPs and clients to perform
poorly or run out of memory. The way I propose to do it, fop-pdf-images
will use an XObject form if the preloader sees a pdf image re-used more
than a configurable number of times (one by default), and otherwise
merge it into the main pdf.

Most of that is done, but there's a problem with ensuring unique
resource names.

XObject Form resource dictionaries are their own namespace, so no
resource name (font, ExtGState, etc) in an XObject Form may conflict
with a name in the parent page's resource dictionary. If XObject Forms
are no longer used by fop-pdf-image, that namespace separation goes
away. I have to merge the image page(s)'s resource dictionaries into
the resource dictionary of the page they're being overlaid over. In the
case of fop, that's the global resource dictionary because fop doesn't
currently write per-page resource dictionaries. There's nothing wrong
with this beyond potentially making the resource dictionary a bit fat,
but it means I need a way to guarantee that a name will not conflict
with any other name assigned by fop.

For GState dictionary objects that's easy; fop just uses GS+object
number as the name, so if I follow the same scheme when copying
resources I'm guaranteed to get a unique name since object numbers are
unique.

Unfortunately, fop doesn't do anything so consistent for fonts or most
other resources, and that's made it nearly impossible for me to
guarantee that I can use a name without a later part of the XSL-FO
causing fop to create an object that tries to use the same name. Solving
this will require some changes to the way fop writes the PDF resources
dictionary.

I propose that the PDFResources class should take responsibilty for
allocating resource names and ensuring they're consistent. Instead of
asking each resource what its name is, the PDFResources class should
*assign* it a name. Those names can be minimal and compact - eg Fnn
for fonts, GSnn for graphics states, etc. nn would be a counter
maintained by PDFResources. That's the convention followed by most other
PDF producing software and would make it simple and reliable to inject
objects not created by fop into the resources dictionary without risk of
conflicts.

That'll be important if people want to be able to write extensions that
add new, custom PDF content; it's not just useful for fop-pdf-images.

This API change would only affect extensions, services and clients that
work directly with org.apache.fop.pdf.   and
org.apache.fop.render.pdf.   classes, and only some of those. Clients
that use the main fop APIs would be completely unaffected, as would
clients that use the area tree / IR code, image loader code, or pretty
much anything except the guts of pdf handling.

I'll post a proposed patch soon, along with patches for some other
changes that enable what I'm doing but may be useful for others. A patch
with the fop-pdf-images merge feature support will follow once I've
finished it enough that I can do test-runs.

--
Craig Ringer



Re: Merging Temp_PDF_ObjectStreams branch to trunk

2012-03-22 Thread Craig Ringer

On 03/22/2012 03:54 PM, Pascal Sancho wrote:

Hi,
+0
I see no objection if PDF-image plugin is updated in a short delay
(I mean: before next release).

This might be a good opportunity to merge fop-pdf-image .

--
Craig Ringer


Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-08 Thread Craig Ringer

On 03/08/2012 01:25 PM, mehdi houshmand wrote:

Haha, well the shortest answer I can give is kinda.

SVG uses Batik, which in turn uses the AWT font classes. Long story
short, you have to install the font on the system as well as having it
in the fop.xconf. There are plenty of discussions on this on the
mailing lists for you to peruse at your leisure.


Thanks. For my purposes, that's a no, since I need to support any 
embedded font whether or not I have access to a complete copy.


BTW, I've modified PDFBox's Overlay.java application to support 
translation, scaling and rotation of the overlay. Once I've cleaned up 
the resource renaming code I'll be able to plug the approach into 
fop-pdf-image - hopefully with few hassles. fop-pdf-image already 
contains most of the pdfbox-to-fop adapter code required.


This won't help with the duplicate fonts (and thus won't help with file 
size) but it might help with the RIP crashes. Here's hoping.


See the PDFBox-dev mailing list for the Overlay.java patch.

--
Craig Ringer



Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-07 Thread Craig Ringer
On 07/03/12 16:35, mehdi houshmand wrote:
 * Insert the concatenated content streams from the source PDF into the
 output content stream. They must be surrounded by appropriate graphics
 state save and restore operators and any necessary scale/position
 operations to place the content where you want it.
 
 HA HA!! Incorrect! If you look into the nooks and crannies of the PDF
 spec, you'll see that it's possible to use content stream arrays for
 the /Page content stream.

Sure - that's why I said the content stream(s) had to be concatenated
before insertion, because the input might be an array of content streams.

I was thinking that to get reliable results when overlaying you'd have
to wrap the whole series of drawing operations from the input in state
saving/restoring operations, etc, thus having to concatenate the streams
before wrapping. In retrospect, that's not true; one can just as well
wrap each copied content stream in state save/restore and scale/position
operations.

It might even be possible to get away without a graphics state
save/restore, but I don't think so. IIRC multiple content streams are
treated by the reader as if they were one concatenated stream, so you
still have to save/restore gstate to ensure the inserted stream doesn't
mess up anything after it. I'll have to check this in the PDF ref, though.

 I'll leave exploring that to you, but
 basically it makes overlaying pages much much simpler. In related
 news, PDFBox does just that!! What we did (and it's super hack, but it
 worked) is if there we pages with both PDF-image content and FOP
 generated content, we'd get FOP to generate the content without the
 PDF-image and just overlay the pages. Best of both worlds!! (Though
 the purist in me is very much aggrieved)

Urk, that's horrible! Effective, though, I expect. Presumably you still
have to translate scale and rotate then clip the content stream you're
overlaying, though.

 [snip]

 The more you describe your problem, the more it sounds like you need
 to do exactly what we did, but just to be sure, I thought I'd explain
 how we got there. Assumptions are a dangerous thing and I've probably
 made some about your issue too.

Given what you've described I'm inclined to agree that the cause of the
issues is the same. I suspect we're facing the same problem or very
similar problems, in which case my RIP crash issues may not be font
related after all.

I still want to fix the font issues because, rip crash causing or not,
the font subset duplication produces massively bloated PDFs that are
totally unsuitable for online distribution. It's kind of disheartening
to learn that the RIP crash issues are probably something else entirely,
since I thought I at least had to solve only one problem.

As for doing exactly what you did: I'd certainly be very interested in
seeing your PDFBox code for loading the fop-generated PDF, finding the
placeholders, and overlaying the PDF graphics over them. In particular
I'd like to see how you handled scaling/translation/rotation/clipping
when drawing the copied streams, and how you handled state saving and
restoration.

I can see overlaying over placeholders in post-processing as a really
useful interim solution, though eventually I'd like to enhance
fop-pdf-image to do that overlaying directly.

The really frustrating thing is that sometimes using an XObject will be
exactly the right thing to do, because the PDF being embedded actually
appears multiple times in the document. The solution to this links
neatly into the font de-duplication issue: fop image plugins need a way
to store per-render-run information, in this case so they can determine
how often an image occurs in a document during the preload run and make
an appropriate decision about how to embed it. I'm not sure it's even
necessary to have an image plugin api change for this; plugins should be
able to store enough information in a WeakHashMapFOUserAgent,... to
figure it out, so I should be able to make fop-image-plugin use form
XObjects only for pdf images referenced multiple times.

--
Craig Ringer



Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-07 Thread Craig Ringer
On 08/03/12 04:12, Vincent Hennebert wrote:
 Just my 2 cents on a particular detail...
 
 On 07/03/12 07:51, Craig Ringer wrote:
 On 06/03/12 18:49, mehdi houshmand wrote:
 snip/
 So with that in mind, what exactly are you trying to do? Why are you
 using FOP to merge PDFs?
 I'm using FOP to produce documents containing a mixture of automatically
 typeset formatted text and graphics. Many of the graphics are PDF
 documents, and need to be PDF documents because they contain vector
 artwork and text that would lose quality and grow massively in size if
 embedded in rasterised form.
 
 Is SVG an option for you? That might save you a lot of trouble. Or if
 not readily available, that might still be less work.

Alas, SVG isn't an option. We have a large body of work already in PDF
(and EPS) format that we can't easily convert to SVG.

Until I checked just now I didn't know that SVG even supported embedded
fonts. Does fop actually support that and include embedded SVG fonts in
output PDF?

--
Craig Ringer



Re: Fwd: Google Summer of Code

2012-03-06 Thread Craig Ringer
 of raster, text and bitmap data that should 
be included in the output document as efficiently as possible and 
without loss of fidelity. IOW, exactly what fop-pdf-image is for.


--
Craig Ringer


Re: Fwd: Google Summer of Code

2012-03-06 Thread Craig Ringer

On 03/06/2012 07:29 PM, Chris Bowditch wrote:

On 06/03/2012 11:08, mehdi houshmand wrote:

Hi Mehdi,


Font de-duping is intrinsically a post-process action, you need the
full document, with all fonts, before you can do any font de-duping.
PostScript does this very thing (to a much lesser extent) with the
optimize-resources  tag, as a post-process action.
At least that is transparent to the user, but re-parsing the input is 
a sub-optimal solution as it incurs a performance penalty so we should 
investigate if there are alternatives first. I can't recall why the 
Postscript Paintewr/Renderer was architected in that way but thats a 
separate topic.


At a guess, because PostScript is much less capable of non-linear 
references and access than PDF is. It's more expensive and slower to 
forward-reference resources because PostScript has to parse and execute 
all the rest of the document to find the resource it wants, while PDF 
just seeks to the object at the byte offset referenced in the xref table 
and reads only the object it requires.




The requirements are perfectly clear: Given a set of input PDFs, 
XSL-FO, create a single merged PDF with a consistent and unduplicated 
set of fonts. Why would there be slight kerning differences if the 
assumption that the font name is unique holds true.
Assuming the font name is unique is dangerous, since it's provably true 
that in the wild there are numerous subtly (and sometimes grossly) 
different fonts with the same name.


The font dictionary contains glyph metrics information that along with 
the font name, slant, weight etc can be used to match the font rather 
more closely. For extra caution, checksums of subset glyphs can be done 
to make sure they're *identical*, but honestly that's unnecessary if the 
metrics match.
If that assumption is wrong then I agree with what you say. Ultimately 
that should be down to the user though, they know their fonts, so they 
can decide whether to merge them or not via a setting in the 
fop.xconf. Your argument is not sufficient to say this approach should 
never be used. It brings a lot of benefit to users who know their font 
names are unique.
It should be safe to do automatically and transparently by default, 
because only partially overlapping subsets of identical fonts should 
ever be merged. Anything else is a substitution not merging duplicate 
subsets, and has entirely different considerations because of the 
possibility of visible changes caused by non-matching metrics etc.


--
Craig Ringer


Re: Fwd: fop-pdf-image and fonts; as requested

2012-03-06 Thread Craig Ringer
documents, and need to be PDF documents because they contain vector
artwork and text that would lose quality and grow massively in size if
embedded in rasterised form.

I'm *NOT* trying to use fop to concatenate PDF pages, to impose PDFs, or
any of that. It'd make very little sense to do that.

 Do you need FOP to do this work?
I either need fop, TeX, or need to write my own document layout system.
The latter would be insane - why implement text justification and flow
algorithms, etc, when it's already well established in fop?

 Have you
 tried merging PDFs with PDFBox and seeing how that affects the RIP?
I haven't, and it's worth a try. It'd produce a document containing many
hundreds of small irregular shaped pages, as each input PDF is quite
small. It'd certainly help confirm whether the issue was XObject form
use, or whether it was font duplication.

--
Craig Ringer


Re: Google Summer of Code

2012-03-05 Thread Craig Ringer

On 03/05/2012 09:35 PM, mehdi houshmand wrote:

Because of the overwhelming popularity of this idea, I've created a
link on the Wiki
(http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
the GSoC proposals.


Things that come to mind for me:

- PDFBox backend (probably ideal for GSoC, nice and self contained, 
great for someone who knows PDFBox and wants to learn fop's codebase);


- CID fonts in PostScript (good for someone who knows PS and fonts, not 
necessarily XSL-FO so much);


- Using automatic +- kerning, +- tracking *and* +- horizontal type 
scaling adjustment to better auto-fit text, involving support for 
font-stretch property. This touches on layout so it may not be practical 
for a 1st fop project, but may not be too bad since fop already adjusts 
tracking when justifying text. The key interest points would be 
*negative* tracking, kerning and (if nothing else works) glyph-scaling 
for tighter type-fitting where it's not desirable to break to a new line 
due to widow/orphan policy or because it'd create large holes. This is 
particularly important when long unbreakable words must fit a fixed 
width space.


- PDF/X-1a with CMYK;

- Anything in the proposed XSL-FO 2.0 feature list (though most of it 
won't be realistic for GSoC projects);


- Merge fop-pdf-image and implement smart merging of font, profile, and 
image resources. I'm working on this one at the moment, but slowly and 
only amid other projects.


--
Craig Ringer


Re: Google Summer of Code

2012-03-05 Thread Craig Ringer

On 03/05/2012 09:35 PM, mehdi houshmand wrote:

Because of the overwhelming popularity of this idea, I've created a
link on the Wiki
(http://wiki.apache.org/xmlgraphics-fop/GoogleSummerOfCode2012) for
the GSoC proposals.



You note font library extraction as a possibility there. I'd like to 
note another possible motivation for extracting the font library: to 
then potentially permit it to be merged with or replaced by pdfbox's 
fontbox, reducing duplicate work.


--
Craig Ringer


Re: Implementing PDF Object Streams

2012-02-27 Thread Craig Ringer

On 27/02/2012 8:08 PM, Vincent Hennebert wrote:

We would like to implement PDF Object Streams as defined in the PDF 1.5
Reference. In short, the structure tree would be stored inside a stream
to allow for compression in the same way as the page content.
What's the status of object stream support in PDFBox? Is it possible the 
feature is bettern implemented by adopting a PDFBox based backend?


There's been long term planning talk of moving over to PDFBox as the 
underlying PDF support library. It'd massively simplify work with 
PDF-in-PDF embedding, reduce maintenance work, etc. Is it worth doing 
major enhancement work on fop's pdf library if it may go away in future?


I'm struggling with getting fop and pdfbox to play well together at the 
moment as I work on enhancing fop-pdf-image to merge duplicate font 
subsets. The use of two different pdf libraries makes fop-pdf-image much 
more complex and makes working with fonts a lot harder. I'm sure it's 
not the only area where a pdfbox-based backend might be good.


--
Craig Ringer


Re: Document and page callbacks for image handlers

2011-12-21 Thread Craig Ringer

On 21/12/2011 5:07 PM, Chris Bowditch wrote:

FOP can't currently fully embed a font in PDF, so even if you had the 
source font available the code changes required could be extensive. 
For us, this approach isn't an option because we don't have the source 
font to register in fop.xconf and embed. Therefore I am interested in 
knowing what you've come up with in terms of merging subsets together 
to create 1 super subset. That in my view is the most difficult 
challenge in this problem. Resolving the problems with the cross 
references and the point at which IDs are assigned should be solvable 
with a little code refactoring. I'm sure one of the guys will speak up 
if that's not the case.


As yet I haven't begun to tackle the actual merging of Type 1 or 
TrueType subsets into a single font. I've done the accumulation and 
merging of the widths arrays, but not the fonts themselves. I plan to 
make new minimum subsets from local fonts if they're available, and will 
try merging of actual embedded font files only if I can't get that to 
work or if I have time. I don't know font data structures well enough to 
want to try merging subset embedded font files if I can possibly avoid it.


I've just finished writing and testing the code to accumulate 
information on each font as its encountered in a source PDF and merge it 
into a collection of font information keyed by 
(FontName,SubType,Encoding). I compare the metrics to ensure that the 
fonts are really compatible and if they are I merge the widths arrays 
and startchar/endchar to produce information. At the end of the run I 
can now produce a font dictionary and font descriptor for the minimum 
subset required to satisfy the requirements of each of the embedded 
documents using that font.


I can report on font usage, glyph usage within each font, and potential 
size savings, but I don't yet have it actually replacing the fonts. 
That's what I'll be working on today. First I'll be trying to use fop's 
font embedding mechanism to do it, which will require adding some 
callbacks to fop's pdf output to run code just before the resource 
dictionary is written out so I can inform fop of the required glyphs. 
I'll be delaying the writing of all the xobject resource dictionaries 
until after the fop resource dictionary is written so I know the fop 
font oids and can embed them in the xobject resource dictionaries. With 
luck I'm hoping I'll be able to write the minimum subset but I haven't 
looked into fop's font embedding code in enough detail to be sure 
exactly what I can do or how, so I'll be going delving shortly.


If this approach works the next step will be to allocate font object IDs 
early so I don't need to waste memory on delaying xobject resource 
dictionary writes and so I can avoid writing keys for fonts fop its self 
never uses to fop's resource dictionary.


Yesterday I attempted to unembed base-14 fonts during import of PDF 
content, so I'd recognise fonts like Helvetica in type1 and replace them 
with a font dictionary for a base14 font reference rather than the embed 
dictionary. Acrobat choked on the result for reasons I'm not entirely 
sure of as it looked OK structurally. I'm not sure quite what was wrong, 
but hope to have more luck with re-embedding rather than replacement 
with a base-14 font.


On a side note, I also need to enhance the font info collection code so 
it keys on more of the font metrics. Currently the first font with a 
given (FontName,SubType,Encoding) tuple is registered for that key, and 
if subsequent fonts with the same key but incompatible metrics are 
encountered they're copied over verbatim exactly as is currently the 
case. Expanding the key to cover the font bbox, ascent and descent etc 
will help solve that and won't be hard, I'm just leaving it until I have 
a proof of concept font re-embed working.


--
Craig Ringer


Re: Difference between RenderingContext and RendererContext

2011-12-21 Thread Craig Ringer
Thanks Vincent, much appreciated.

It turns out I can't use the render context for what I'd hoped to anyway.
I'm having to use a weakhashmap keyed on the FOUserAgent to associate
image-handler private data with a rendering run.

I'm putting together a proposal and patch to add interceptors around
IFDocumentHandler calls, with a service loader based mechanism for fop
image handlers, document extensions etc to register handlers. This will -
if accepted - let plugins and exts do extra work at various document
production phases and provide a cleaner way for plugins and exts to keep
per-render data. It will also provide an easy way to support checking for
cancellation of rendering runs fairly frequently.

The main problem is that I don't see how I can do it without repeating a
line of interceptor call code at the start and end of every
IFDocumentHandler method implementation, breaking source compatibility for
external subclasses of AbstractIFDocumentHandler, or using a tool like
JavaAssist. I'm inclined to break source compare for
AbstractIFDocumentHandler subclasses and just not propose inclusion in
1.0.x stable but I'd like your view on this.

Note that I can't use the user-oriented  event system - it doesn't give any
access to renderer/handler state and probably shouldn't, plus it doesn't
offer low level enough events and again probably shouldn't.

If it were targeted at fop 1.1.x only would I possibly be able to use Java
5 in a patch? It'd make this one much cleaner, particularly being able to
use enums .
On Dec 20, 2011 11:34 PM, Vincent Hennebert vhenneb...@gmail.com wrote:

 Hi Craig,

 On 16/12/11 13:29, Craig Ringer wrote:
  Hi all
 
  While reading over the pdf-image extension and fop code, I'm having a
bit of
  an interesting time figuring out the difference between a few things
and was
  hoping for a very brief pointer.
 
  I'm not quite sure what differentiates
org.apache.fop.render.RendererContext
  from org.apache.fop.render.RendererContext . The JavaDoc comments don't
really
  differentiate them and they look quite similar.

 This is hard to tell. I could trace RendererContext as far back as 2003,
 while RenderingContext was added in 2009 with the new XML Intermediate
 Format. I don’t know why RenderContext was not deleted or retrofitted
 back then.

 Given that RenderingContext is more recent, is an interface and has many
 more implementations than RenderContext has sub-classes (actually only
 one in the AFP code), I’d say it’s a safe bet to go with
 RenderingContext.


  This isn't helped by the fact that the pdf-image extension provides two
image
  handlers - one implements PDFImageHandler and takes a RendererContext,
while
  the other implements ImageHandler and takes a RenderingContext. They
seem to
  do much the same thing.
 
  Is this a case of old-backwards-compat-code meets new-code? If so, which
  should I target for future work?
 
  *headscratch*
 
  --
  Craig Ringer

 HTH,
 Vincent


Re: Document and page callbacks for image handlers

2011-12-18 Thread Craig Ringer


- A clean way to associate data that's private to the image processing 
plugin with a particular rendering run so I can access it across 
multiple invocations of the plugin; and


For anyone else who needs this later: There doesn't appear to be any 
especially nice way to do this with FOP's current image handler API, as 
there's no general-purpose map on the user agent for image handlers to 
stash their data in and nothing like that is passed as a param to the 
image handler calls. The hints mechanism can pass data from a preloader 
to a loader for the same image, but it can't be used to pass data 
between image loaders.


What I've landed up doing is keying a WeakHashMap off the FOUserAgent 
for the rendering run, as obtained via the RenderingContext passed to 
ImageHandler.handleImage(...). So long as lookups and insertions on the 
WeakHashMap are synchronized this is safe and will release the image 
handler's per-render information when the FOUserAgent is discarded at 
the end of the rendering run.


I'm now able to accumulate font usage information from the PDFs I 
examine as I embed them and build a list of which fonts are used. I can 
combine width arrays and first/last char listings to determine which 
glyphs are required if the font is to be embedded as a subset.




- How to append some additional PDF objects after the last page is 
emitted but before the PDF document trailer and final xref table(s) 
are written out.


For anyone else looking at this now or later:

It's possible to allocate a PDFObject and request that it be written out 
at the end of the document. PDFDocument.outputTrailer(...) writes 
objects added to the trailer list. Those objects were allocated via the 
factory where they were given an object ID, but were then passed to 
addTrailerObject(...) to request that they be written out at the end of 
document production. If I ever start producing my own combined font 
subsets from the original subset fonts in the input PDFs, this is 
probably how I'd insert the combined font subset object.


If I'm restricting font combining to fonts where fop has an original 
font file and using fop's font subsystem the above would require too 
much duplication and make it hard to avoid embedding fonts twice (once 
for form xobjects, once for main content). Instead I need to mark a font 
as used in fop's FontInfo for the rendering run so fop writes it out, 
and I need to obtain the font object's PDF object ID so I can write 
forward references to it in the XObject forms' resource dictionaries.


The problem here is that fop doesn't assign fonts an object ID until 
very late in writing. The first reference to font objects is from the 
resource dictionary, and fop only writes one of those - it is shared 
between all pages and written out just before the trailer. Since fonts 
are written out with the resources dictionary and don't usually need 
object IDs until the resources dictionary has to reference them there's 
no way to get their object IDs earlier in PDF production. This changes 
when we need to write private resource dictionaries for embedded form 
xobjects.


I'm looking at forcing early embedding of fonts with direct 
makeFont(...) calls. This'll work so long as I'm happy embedding whole 
fonts, but will prevent fop from subsetting the font for its own use and 
prevent me from subsetting it for xobject forms.


Alternately, I could defer the writing of the xobject form resource 
dictionaries till the end of the document so I didn't need to know the 
font object IDs early - but I'd still need a way to write them *after* 
the main fop resource dictionary. If I wanted to subset then I'd also 
need a hook for just before fonts were written out by fop to adjust the 
glyph width tables. I don't see any way around this without some kind of 
PDF renderer listener for image handlers etc to use.


I'll try to put together a proof of concept that embeds whole fonts if 
the font is found in a pdf form xobject, de-duplicating references so 
all pdf form xobjects that use that font reference the same one. Fop 
will use the same font since it knows about it and has stored it in the 
used fonts map, so the only problem is that the whole font is embedded 
rather than a subset.


Anyone working on the same thing, please feel free to drop me a note.

--
Craig Ringer


Difference between RenderingContext and RendererContext

2011-12-16 Thread Craig Ringer

Hi all

While reading over the pdf-image extension and fop code, I'm having a 
bit of an interesting time figuring out the difference between a few 
things and was hoping for a very brief pointer.


I'm not quite sure what differentiates 
org.apache.fop.render.RendererContext from 
org.apache.fop.render.RendererContext . The JavaDoc comments don't 
really differentiate them and they look quite similar.


This isn't helped by the fact that the pdf-image extension provides two 
image handlers - one implements PDFImageHandler and takes a 
RendererContext, while the other implements ImageHandler and takes a 
RenderingContext. They seem to do much the same thing.


Is this a case of old-backwards-compat-code meets new-code? If so, which 
should I target for future work?


*headscratch*

--
Craig Ringer


Document and page callbacks for image handlers

2011-12-16 Thread Craig Ringer

Hi

I'm interested in implementing merging of duplicate subset fonts in 
Jeremias's fop-pdf-image extension. I'm working with fop 1.0, and i'm 
having trouble with two things:


- A clean way to associate data that's private to the image processing 
plugin with a particular rendering run so I can access it across 
multiple invocations of the plugin; and


- How to append some additional PDF objects after the last page is 
emitted but before the PDF document trailer and final xref table(s) are 
written out.


I think I've figured out how to associate private image plugin data with 
the document being rendered (using the RenderContext as a key to a 
static WeakHashMap in the plugin) but it seems like a pretty ugly 
approach, so I'm hoping there's a better way I'm missing.


More problematic is that I need to emit additional PDF objects after the 
last page - or at least the last image - has been rendered. I can't see 
any obivous callbacks to fop image plugins to notify them when a new 
document is started, a new page started, a page finished or a new 
document finished. Fop seems to have an event system but it seems to be 
oriented toward trapping error conditions and problems rather than doing 
extra work at certain processing phases. In particular, PDFEventProducer 
doesn't seem to be useful for this.


Any advice on how to hook document completion after the last page is 
written but before the pdf trailer and other closing pdf structure are 
written, so I can write out some indirect objects I've only written 
indirect references to so far?


Sorry if these are somewhat stupid questions. I'm very new to fop's 
codebase and I'm still getting my head around it.


--
Craig Ringer


Document and page callbacks for image handlers

2011-12-16 Thread Craig Ringer

Hi

I'm interested in implementing merging of duplicate subset fonts in 
Jeremias's fop-pdf-image extension. I'm working with fop 1.0, and i'm 
having trouble with two things:


- A clean way to associate data that's private to the image processing 
plugin with a particular rendering run so I can access it across 
multiple invocations of the plugin; and


- How to append some additional PDF objects after the last page is 
emitted but before the PDF document trailer and final xref table(s) are 
written out.


I think I've figured out how to associate private image plugin data with 
the document being rendered (using the RenderContext as a key to a 
static WeakHashMap in the plugin) but it seems like a pretty ugly 
approach, so I'm hoping there's a better way I'm missing.


More problematic is that I need to emit additional PDF objects after the 
last page - or at least the last image - has been rendered. I can't see 
any obivous callbacks to fop image plugins to notify them when a new 
document is started, a new page started, a page finished or a new 
document finished. Fop seems to have an event system but it seems to be 
oriented toward trapping error conditions and problems rather than doing 
extra work at certain processing phases. In particular, PDFEventProducer 
doesn't seem to be useful for this.


Any advice on how to hook document completion after the last page is 
written but before the pdf trailer and other closing pdf structure are 
written, so I can write out some indirect objects I've only written 
indirect references to so far?


Sorry if these are somewhat stupid questions. I'm very new to fop's 
codebase and I'm still getting my head around it.


--
Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/


Difference between RenderingContext and RendererContext

2011-12-16 Thread Craig Ringer

Hi all

While reading over the pdf-image extension and fop code, I'm having a 
bit of an interesting time figuring out the difference between a few 
things and was hoping for a very brief pointer.


I'm not quite sure what differentiates 
org.apache.fop.render.RendererContext from 
org.apache.fop.render.RendererContext . The JavaDoc comments don't 
really differentiate them and they look quite similar.


This isn't helped by the fact that the pdf-image extension provides two 
image handlers - one implements PDFImageHandler and takes a 
RendererContext, while the other implements ImageHandler and takes a 
RenderingContext. They seem to do much the same thing.


Is this a case of old-backwards-compat-code meets new-code? If so, which 
should I target for future work?


*headscratch*

--
Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/


Re: OpenType font library [was: Re: How much work is needed for FOP to support OpenType fonts?]

2011-01-19 Thread Craig Ringer
On 19/01/11 16:35, Simon Pepping wrote:
 I take this discussion to express my worries that FOP needs to create
 its own support for fonts, among which Open Type Fonts. FOP's core
 task is the layout and printing of FO files. If FOP could rely on good
 font libraries, that would make our code base so much smaller and our
 development tasks so much easier.
 
 If I am not mistaken, Firefox does a fairly good job at representing
 Indic scripts. Do they use a generally available library?

There are several libraries for complex scripts, including the
commonly-used libbidi and pango libraries. All the widely used ones that
I know of are C- or C++ libraries.

While Java can use C and C++ libraries, a Java Native Interface (JNI)
layer must be written. Further, JNI code and the libraries it uses must
be compiled separately for each supported platform and architecture,
making packaging and deployment of the Java code a ***PAIN*** unless all
the native code parts are shipped as part of the JDK/JRE.

Even allowing for the issues with complex/bidi libraries, Apache FOP
must also handle the OpenType font format its self, including support
for font subsetting and embedding. That's way outside the scope of
complex script and bidi libraries. While library code exists to help
with OpenType handling, I'm not aware of any even C or C++ libraries
that provide useful, fairly abstracted facilities for subsetting and
embedding without tying them in to related PDF libraries.

While Java its self can use OpenType fonts, it doesn't expose the
details of its OpenType parser etc to Java applications. In any case, it
may use the platform's font support rather than bundling its own. Java
apps need to provide their own OpenType format handling if they want to
do more than just use the fonts with Graphics2D and the other Java
rendering APIs, because there's no way to get to the guts of the fonts
loaded by the JVM.


Ideally, Apache FOP could be built on top of:

- A low-level Java font format and parsing library that can identify
fonts, enumerate tables and glyphs, detect features, etc.

- A low-level Java PDF library that handles the PDF document structure,
xref and indirect object management, PDF data structure representation,
direct-to-disk streaming of big images into PDF object streams, etc etc.

- A Java library that uses both of the above to provide features for PDF
embedding of OpenType and TrueType fonts.


Unfortunately, AFAIK *NONE* of them exist, or at least are used, at
present. Fop seems to have its own PDF output code and own font handling
code. I don't see any obviously advantagous 3rd party replacements for
the font handling code, and most of the 3rd party PDF engines (like
iText) appear to be a bit limited when you want to insert your own
low-level PDF content stream data, objects, etc to implement features
not supported directly by the PDF library you're using.

I was looking into this a little myself while checking to see how hard
it'd be to implement /DeviceCMYK and /ICCBased colour in FOP. Because of
the way FOP stores and manages colour internally, the answer appears to
be very at present, especially if you want to support PDF/X
requirements and handle CMYK passthrough.

-- 
System  Network Administrator
POST Newspapers


Re: color issues [was: OpenType font library]

2011-01-19 Thread Craig Ringer
On 19/01/11 19:13, Jeremias Maerki wrote:
 Craig, you might want to try out the color branches for which I've just
 started to vote to merge it into trunk. The color branch adds Named
 Color (separation, spot color) support and CIE Lab support.
 
 However, ICC and device CMYK colors should already work in FOP Trunk/1.0.
 Could you maybe elaborate on your problems there?

Hmm. I seem to have missed the cmyk() function in fop, and less
excusably the rgb-icc() function in XSL-FO proper. Maybe it was CMYK and
ICC tagged images I was having issues with?

I'll look into it again, it's been ages since I was looking at that, and
I was on fop 0.95 when investigating that stuff so it may have changed
since then anyway.

-- 
System  Network Administrator
POST Newspapers


Re: offo in maven [was: Re: DO NOT REPLY [Bug 49881] [PATCH] add maven build support]

2010-09-09 Thread Craig Ringer

On 9/09/2010 3:00 PM, Simon Pepping wrote:

I found offo in maven central:
http://repo1.maven.org/maven2/net/sf/offo/fop-hyph/1.2/. I did not put
it there.


Hmm. That makes me officially blind.

Thanks :-)

Ah well, it served as a useful example of the methods.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


Re: DO NOT REPLY [Bug 49881] New: [PATCH] add maven build support

2010-09-07 Thread Craig Ringer

On 7/09/2010 1:52 PM, Jeremias Maerki wrote:

Well, Ivy has one fundamental problem in common with Maven that many regard
as a great feature: the repository. Numerous times, I couldn't get a Maven
build to complete successfully because some artifact was temporarily or
permanently unavailable.


First: I'd like to note that none of the following is meant to sound 
like some kind of ra ra ra you should use maven and only maven, maven 
is the truth and the light. It's just a tool, and like all tools has 
things it's good for and things it's not so good for.


That said, I've never had issues with remote repositories - I routinely 
use sonatype nexus (jboss) repos, Central, java.net, and a couple of 
private repositories.


I guess it helps that once files are fetched by maven and cached in the 
local repository, that's it. Unless you change a dependency's version or 
use snapshot versions, there's no more network access.


There's always the option of doing the same thing you currently do with 
ant - bundle copies of the dependencies in shipping versions or maintain 
a separate 3rd pty dependencies repo under version control. I guess I 
don't really see the difference.


Here I keep a common repo under version control, but that's mostly to 
save download time on big files, and is exactly the same thing I do for 
non-Maven resources like JDK snapshots. It would insulate me a bit from 
transient failures in remote repositories, though.


(I do wish that Maven would print a warning and use the last-downloaded 
-SNAPSHOT version if it didn't have network access and snapshot updates 
were enabled, though. It's the only area where connectivity requirements 
do cause me issues.)


 And how many times did a Maven/Ivy build

download half the Internet just to build a small project?


Generally only if it's misconfigured, or that small project uses 
plugins/libraries with a lot of dependencies. In the latter case, you're 
going to need to get them one way or the other.


 My Eclipse's Maven and Ivy plug-ins are

long uninstalled because of the trouble they caused.


Aaah. I don't use Eclipse - and given the nature of my experences with 
it when I've tried using it for something, I wouldn't be surprised by 
problems.


I use NetBeans for most work, and the command line where convenient.

I don't suppose you were relying on any SNAPSHOT version plugins or 
libraries? Because if you were and you had snapshot updates enabled (the 
default - unfortunately IMO) then I can certainly see it seeming like it 
wants to download the internet whenever you run a build.



Another problem of an external repository is the lack of license
management. ASF projects have clear requirements what kinds of
dependencies are allowed. If you can't control transitive dependencies
based on a license policy you're bound to run into a problem there.


Now that can be a problem. Again, though, I'm not sure how different it 
is to a 3rd party library you use bundling libraries of unknown 
licensing as dependencies. Either way, you have to check.


release Maven artifacts won't change dependencies without a version 
change, and you have to do that kind of checking whenever you update 
anything, maven-based or not.



I can check out (or extract) FOP and build at least a basic version
locally with no outside connection. I like that and would like it to
stay that way.


The same is true with Maven. It doesn't have to try to download the 
Internet, nor does it need 'net access for builds. I routinely do (re) 
builds on my laptop while disconnected.


I have the required artifacts in my local ~/.m2 repository already, and 
that's all I need. If I was using an Ant project I'd have to have 
obtained the required dependencies to put on the classpath somehow; same 
deal. Whether I populate my ~/.m2 from Internet repositories, or check 
out a private pre-populated maven repo from version control, I still 
have to obtain it somehow.


That said, I do find that the way it doesn't tend to include most of the 
core plugins in the initial Maven download - and therefore fetches them 
when you first do a build - to be annoying.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


Re: DO NOT REPLY [Bug 49881] New: [PATCH] add maven build support

2010-09-07 Thread Craig Ringer

On 7/09/2010 4:40 PM, Jeremias Maerki wrote:


I guess we're in a religious dispute here, like PC vs. Mac. So we
can't expect to reach a consensus.


Well, certainly a discussion of preference. I know it gets religious for 
some Java folks, but myself I don't mind too much so long as nobody 
tries to force their choice on me. I can use Maven without having to 
care what others use or force it on them. I'm only weighing in on this 
discussion to say that I'd like the option for maven builds if it 
doesn't get in anyone else's way, and address some possible 
misunderstandings about maven.


I like dealing with maven in projects because for me it is a known 
quantity and imposes some consistency on projects that I personally 
like. OTOH, I manage ok if a project doesn't use maven, at least so long 
as I don't have to wrangle the guts of its build system.



Anyway, I won't to stand in the way
if something is added to FOP that can help some users. [snip] just because Maven
can't include a simple JAR that is not in a repository.


Not strictly true. One option is to use scopesystem/scope with an 
explicit path to the jar.


Maven doesn't have a wild-card include everything under lib/ though, 
and using system scope to fudge in local depencies is a bit of a hack.


http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies


Usually what you'd do if you have a jar you want to use - but no repo or 
pom for it - is drop the jar you want to use into your local ~/m2/ (or 
wherever you keep your local repository, ie download cache) then declare 
a dependency on it in your pom.xml. This is within a repository but 
it's only your local repo, it doesn't involve any network access or 
anything except putting a file in a particular place. Maven will look 
for the dependency in a location defined by the repo layout. So if I 
declared


dependency
  groupIdlocal/groupId
  artifactIdsomejar/artifactId
  version2.2/version
/dependency

... then it'd look for local/somejar-2.2.jar within my local repository. 
If I put the jar where it should be found, no problem.


I don't personally find that to be any worse than dropping everything in 
lib/ ... and I find it makes it a LOT easier in the long run to let mvn 
take care of the mess of secondary (transitive) dependencies involved in 
using things like Hibernate.


(OK, so maven does whine annoyingly about not being able to find the 
pom.xml for the artifact, which bugs me - but it works fine nonetheless).


 But I consider
 Maven viral as we're seeing here. Due to its inflexibility, projects
 are almost forced to adopt it to keep everyone happy,

I can't speak for the obsolete Maven 1.x, but that's not true of 2.x . 
To keep everyone happy it *does* help to publish artifacts to a maven 
repository (be it Central or somewhere else) but there's no need to get 
Maven anywhere near your builds if you don't want to, and there's no 
need for the people maintaining the project and doing the development 
work to have anything to do with pushing project releases to maven central.


If you *do* want to create and push maven artifacts yourself but don't 
want to use Maven in builds, a Maven artifact can be created with the 
cp command and a text editor, or with an Ant task to spit out a 
suitable generated pom.xml . No biggie.


You can use Maven builds with jars not created or managed with Maven, 
you can use Ant to produce Maven artifacts, and you can use Ant to 
consume Maven-produced artifacts. It doesn't really force anything on 
you at all.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


Re: DO NOT REPLY [Bug 49881] New: [PATCH] add maven build support

2010-09-07 Thread Craig Ringer

On 7/09/2010 7:22 PM, Benson Margulies wrote:

I've never seen a message to one of the mailing lists
complaining that connectivity issues were making people miserable. Why? You
need connectivity to update from svn. Then you need connectivity to run a
build.


... and to get any libraries or other dependencies if you don't already 
have them locally. Just like with Maven.


BTW, I suspect many people who have trouble with Maven's apparent net 
access requirements don't know about mvn dependency:go-offline and 
mvn -o for offline operation that doesn't try to check snapshot repos 
etc. mvn -o is kind of hard to miss, but people seem to anyway; the 
go-offline goal is rather less obvious but really handy.


Meh. I'd like to see maven support in fop, but I'm not working with 
fop's code much at all so it's hardly something I can claim any say in. 
Maybe I should bash together an ant task to spit out Maven artifacts 
after a build, though, to make it easier to use fop's existing build 
tools to integrate with maven.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


Re: Font Glyph?

2010-07-26 Thread Craig Ringer
On 27/07/10 08:03, Glenn Adams wrote:
 Let's see if I have any luck obtaining the last resort font for direct
 inclusion in FOP.

The URW free fonts may have a suitable license. They're distributed with
GhostScript among other things. As they cover the base set of PostScript
fonts, they could be ideal.

The BitStream Vera family might also be useful, though they don't have
the same metrics and appearance as common base fonts.

--
Craig Ringer


Re: Intermediate Format (IF) with placeholder / marker

2010-07-13 Thread Craig Ringer
On 13/07/10 19:01, Adam Kovacs wrote:
 Hi there,
 
  I would need an XML element from any namespace which I can use in the XSL 
 and is not interpreted by FOP and goes into the Intermediate Format (IF) 
 output as it is.
 I want to use this element as placeholder in the FOP_IF, and in a second step 
 I can modify the FOP Intermediate File for my needs.

I've been doing this using the area tree (AT) output format, using named
blocks as suggested already.

It seems to me like the IF is much less useful for modifying before
output, because it doesn't seem to preserve identifiers, and omits quite
a bit of information you might want. In my case I found that there was
no XML level way to tell what blocks were contained in which columns of
a multi-column flow; I'd have to do it with geometry using the IF. In
the AT output, it's obvious from the XML nesting - each column is a
flow within a span for the page's content.

So... consider using the area tree output instead. It might be a better
fit for your needs.

--
Craig Ringer


Potential contract project

2010-06-25 Thread Craig Ringer
Hi folks

I'm posting to -dev because I've hit a wall with FOP that I'm interested
in paying someone to knock down.

For details on the issue, see the thread Distributing vertical space in
a column while repeating column headings in -users, starting with
message-id 4c1f97b7.9040...@postnewspapers.com.au . Archive thread
begins here:

http://mail-archives.apache.org/mod_mbox/xmlgraphics-fop-users/201006.mbox/%3c4c1f97b7.9040...@postnewspapers.com.au%3e


There are several possible approaches to solve the issue I'm having, but
the simplest looks like implementing vertical justification of space
within one-column tables, by respecting and applying elastic
space-before on blocks within table cells in one-column tables.

An alternative option would be to add conditional output to blocks, so
that a certain block could be suppressed (or only shown) if it was the
first object after a break. But I suspect that'd be a lot more
complicated, and would require extensions to the standard, whereas
vertical justification of elastic space within tables would not. OTOH,
it'd be really handy.

In the end, any method that'll let me repeat headings for sections of
flowed listings when those sections are broken across columns on a
multi-column page, AND vertically distribute space within those columns,
would do. Right now, with fop I can do one or the other but not both.


If you feel you know the layout engine well enough to tackle this, think
it's something reasonable to do, and can do it well enough that your
changes can be committed into fop trunk, I'd be interested to hear from
you with an estimate of how much work it'd take you over how long, what
approach you'd take, and how much you'd want to charge for the work. Be
realistic about time/effort, not optimistic.

The rights to any changes would be licensed under the same terms as fop
its self, with copyright held by the author, and would want to seek
inclusion of the changes into fop's trunk. In fact, that'd have to be
condition of satisfactory completion, since I *really* don't want to be
in the job of maintaining an external patch against complex core bits of
fop!

Anyone interested?

-- 
Craig Ringer
System and Network Administrator
POST Newspapers