DO NOT REPLY [Bug 53005] New: indexterm inside a footnote from docbook

2012-03-29 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=53005

 Bug #: 53005
   Summary: indexterm inside a footnote from docbook
   Product: Fop
   Version: 1.0
  Platform: PC
OS/Version: Mac OS X 10.4
Status: NEW
  Severity: normal
  Priority: P2
 Component: fo tree
AssignedTo: fop-dev@xmlgraphics.apache.org
ReportedBy: hashas...@gmail.com
Classification: Unclassified


Was getting this error:

javax.xml.transform.TransformerException:
org.apache.fop.fo.ValidationException: fo:inline is not a valid child of
fo:block!  (See position 1870:716)

After talking with the folks in the docbook XSLT mailing list, they helped me
finding out that:


Based on the .fo file that Alberto sent to me, this appears to be a bug in FOP
1.0.  I can reproduce it by putting an indexterm inside an inline element
inside a footnote. In general,  an indexterm generates an fo:wrapper element to
hold the indexterm id marker.  When this fo:wrapper is inside an fo:inline, it
appears to confuse FOP, but only when inside a footnote. Removing the
fo:wrapper removes the error.  All other locations with that construction do
not generate an error.  Two other XSL-FO processors did not produce an error.


Somebody added:


FYI, until recently indexterms were not allowed inside footnotes at all:
  http://www.docbook.org/tdg5/en/html/footnote.html
They are allowed with v5.1:
  http://www.docbook.org/tdg51/en/html/footnote.html


Thank you,
Alberto

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


Re: PDFName.getName() returns escaped name?!

2012-03-29 Thread J.Pietschmann
Am 29.03.2012 01:24, schrieb Craig Ringer:
 I'd also like to have getEncodedName() return a byte[] not a
 String, since an encoded PDF name isn't actually text data.

Sounds like a reasonable idea.

 BTW, is there any reason Fop's PDF library uses java.lang.String when
 working with sequences of PDF data bytes?

I'd chalk this up to historical reasons, as usual. Fell free to
provide a patch which cleans this up.

J.Pietschmann


Re: PDFName.getName() returns escaped name?!

2012-03-29 Thread Craig Ringer

On 03/30/2012 05:09 AM, J.Pietschmann wrote:

Am 29.03.2012 01:24, schrieb Craig Ringer:

I'd also like to have getEncodedName() return a byte[] not a
String, since an encoded PDF name isn't actually text data.

Sounds like a reasonable idea.


BTW, is there any reason Fop's PDF library uses java.lang.String when
working with sequences of PDF data bytes?

I'd chalk this up to historical reasons, as usual. Fell free to
provide a patch which cleans this up.

J.Pietschmann


Here's how I'd like to rewrite PDFName; untested code as an example of 
what I'm getting at. This is just a standalone file; a patch that 
incorporates it into the main sources will be a lot more work that I'm 
holding off on until I know folks here agree with the approach.


In any case, after reading more of the PDF library I'm rethinking the 
wisdom of trying to make this change. The change its self is correct, 
but it'll be really hard to safely integrate into the rest of the PDF 
library because of the difficulty of auditing every site to ensure 
nothing breaks. Java likes to call `toString' automatically in places, 
meaning that anywhere that doesn't use the proper PDFWritable output 
methods PDFName inherits will break by producing bad PDF data that might 
be quite hard to spot. I'd start by making PDFName.toString() throw (for 
testing), but that'd only catch issues in code that test paths actually hit.


Given the number of these kinds of issues in fop's pdf library I'm more 
and more inclined to wonder if it should just be replaced with PDFBox. 
It's *full* of text encoding issues, it crams 8-bit binary data into the 
lower 8 bits of Unicode strings, etc. Most of the classes that extend 
basics like PDFDictionary act like the base class isn't public API and 
break if anyone else changes the dictionary in ways they don't expect, 
too; they should have-a PDFDictionary not be-a PDFDictionary really.


PDFBox is far from perfect, but it has a clean separation between the 
model classes (PD) and the basic PDF data types (COSxxx); it has a 
clean PDFName, PDFString, etc; it has a good PDF parser already, etc. 
Maybe it'd be easier for me to whip up a port of FOP's PDF output code 
to PDFBox? I suspect I'm insane to mention the possibility of doing that 
without evaluating the amount of work involved first, so I'm not 
promising anything, but by the looks it might be easier than doing the 
cleanups I'd like to do in fop.


Thoughts?

--
Craig Ringer
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/* $Id$ */

package org.apache.fop.pdf;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.io.Serializable;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CodingErrorAction;
import java.util.*;
import org.apache.commons.io.output.CountingOutputStream;

/**
 * Class representing a PDF name object.
 * 
 */
public class PDFName extends PDFObject {


private static final MapByteString, PDFName commonNames;

private final ByteString unescapedName;
private ByteString escapedName;

/**
 * Creates a new PDF name object from a Unicode java string,
 * encoding the name as UTF-8.
 * 
 * @param name the name value
 */
public PDFName(String name) {
super();
this.unescapedName = new ByteString(name.getBytes(java.nio.charset.StandardCharsets.UTF_8));
}

/**
 * Creates a new PDF name object from a sequence of bytes
 * in no particular encoding.
 * 
 * By PDF convention you should use utf-8 when encoding names
 * (as is done by the String-based PDFName constructor), but this
 * is NOT required by the spec.
 */
private PDFName(ByteString name) {
super();
this.unescapedName = name;
}

/**
 * Create a PDFName with a pre-escaped name supplied. This is mostly useful
 * when defining names from data parsed from PDF data, or when allocating
 * pre-cached names.
 * 
 * @param unescapedName Name with PDF name escapes decoded
 * @param escapedName Name encoded with PDF escapes
 */
private PDFName(ByteString unescapedName, ByteString escapedName) {
this.unescapedName = 

Re: on changing fop documentation sources to markdown

2012-03-29 Thread Glenn Adams
On Thu, Mar 29, 2012 at 10:22 PM, Glenn Adams gl...@skynav.com wrote:

 I also feel it is very important to continue using FOP documentation to
 create *some* output format. I am not prepared to give up our dog food,
 as that provides one more set of tests on FOP, that would otherwise be
 missing. Given the sparseness of FOP test coverage, the more content we
 formally run FOP on, the better.


s/FOP documentation/FOP processing/


Re: Working on getting the web site converted to Apache's ASF-CMS...

2012-03-29 Thread The Web Maestro
On Tue, Mar 20, 2012 at 3:59 AM, Vincent Hennebert vhenneb...@gmail.comwrote:

 A change that I personally welcome. Quite frankly the PDF versions of
 our web pages look terrible and don’t do honour to FOP. And I’ve never
 been comfortable enough with Forrest to change the way they look. And
 I must say, I don’t have the energy to learn enough of Forrest to be
 able to do it.

 I’m happy with the all-CMS and Markdown approach, especially if it can
 give our website a 21st century look.

 Vincent


Apologies for the making you create a new /dev/null^H^H^H^H^H mail rule to
re-direct buildbot msgs...

Well, we don't quite have the 21st century look I was hoping for yet, but
I've got a few pages on the new staging site:

http://xmlgraphics.staging.apache.org/

We're still very prelminary. I haven't done much with the look  feel yet,
but we're on our way... It took me a bit to get Forrest outputting
'MarkDown' format (insert story about Mac OS X, downgrading Java, and
multiple versions of Forrest).

I need to figure out how to make a sidebar/navigation bar (I'm hoping for a
common one!). I also need to get sub project pages up.

Of course, I/we have yet to determine exactly how we'll go about handling
the Documentation.

If you'd like to start messing around, you can start here:

https://cms.apache.org/#bookmark

and this'll help:

http://www.apache.org/dev/cmsref.html

Cheers!

Web Maestro Clay