Re: Patching with GIT/SVN (Re: Making MinOptMax Immutable)

2009-11-04 Thread Alexander Kiel
Hi Max,

 One more thing I noticed: Alex, you are introducing deprecated methods
 and then using them. Please either
 - don't mark the methods as deprecated, if they are valid helper methods or
 - don't use the deprecated methods, but rather use whatever you inted to
 replace it with.

I marked the methods as deprecated, because they are deprecated in a
sense that I don't want them in MinOptMax and they are used in one code
region which is not covered by tests and so I don't know if this region
is used at all. I will discuss this with Vincent. Perhaps he knows
something of the code region which uses my deprecated methods.

Best Regards
Alex


signature.asc
Description: This is a digitally signed message part


Re: Patching with GIT/SVN (Re: Making MinOptMax Immutable)

2009-10-29 Thread Alexander Kiel
Hi Max,

you are right. It's always better to have small patches focused on one
thing. I don't get my MinOptMax patch focused only on the refactoring of
making MinOptMax immutable.

In the last half-an-hour I walked myself through all the diffs,
file-by-file. I must say - except from TextLayoutManager - it is
possible to understand all changes.

There are two other things done:

 - changing the signature from 
   InlineLevelLayoutManager#getWordChars from
   void getWordChars(StringBuffer sbChars, Position pos) to
   String getWordChars(Position pos)

 - moving the adjustment enum constants from BlockLevelLayoutManager
   into its own class.

All other things are renamings (okay mostly unrelated to MinOptMax) und
reformattings. The problem with the reformatting is, that I mechanically
type Ctl + Alt + L in Intellij after each crappy written peace of code.
I even tried to reformat only selected lines. But one unattended Ctl +
Alt + L is sufficient :| I mean, my code style options in Intellij
conform to the FOP coding styles. Mostly the reformatting corrects
things historically not conforming to the coding styles before.

Now, I could rewind all the not related refactorings from the patch. But
I fear that this would be much work.

So I have one suggestion: Max - maybe we could use Skype and walk
through the code together. If we both see the same diff and I can answer
your questions, I think it would be faster than as when I remove all the
unrelated stuff. Maybe if we both came to the conclusion that it would
be better to remove some aspect entirely - I would do this of course. I
nice side effect from this Skype session would be that we become more
familiar to one another. 

If I think about my OpenType patch or topics like refactoring the font
subsystem and advanced OpenType layout features in text processing, some
Skype sessions would be very useful.

This weekend, I'm a bit offside in Brandenburg without internet. So if
the Skype option is an option I'm happy to talk on Monday - Thursday
evening.


Best Regards
Alex 

On Thu, 2009-10-29 at 14:45 +0100, Max Berger wrote:
 Hi Alex,
 Hi *,
 
 if you do not yet have FOP developer access, and you are working on a
 larger set of problems, please do not submit one large patch - current
 committers will not have the time to go through every single change.
 Instead, it is much nicer to have a series of small patches.
 
 One option is to use git. There is a current git clone of the FOP source
 tree available [1][2]. It also provides help to untangle tangled working
 copies [3]. Git lets you produce patches between different individual
 changesets [4], and detects if the patches where applied by someone else.
 
 References:
 [1] http://wiki.apache.org/general/GitAtApache
 [2] git://git.apache.org/fop.git
 [3] http://tomayko.com/writings/the-thing-about-git
 [4]
 http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#sharing-your-changes
 
 hth
 
 Max
 
 Alexander Kiel schrieb:
  Hi,
  
  a issued a patch for MinOptMax:
  https://issues.apache.org/bugzilla/show_bug.cgi?id=48071
  
  Please read my first comment there and consider my patch :-)
  
  Best Regards
  Alex
  
  On Sun, 2009-10-25 at 23:45 +0100, Alexander Kiel wrote:
  Hi,
 
  the class MinOptMax has some 800 usages in FOP. It holds a triple of
  values (min, opt, max) of length quantities. 
 
  It's heavily used during local computations and passing around. It's
  fields are public (whereas the class comment says they are only package
  visible). The public fields (and many methods) make MinOptMax mutable.
  This mutability is used in the computations for sheer performance
  reasons. But this mutability is a big bug attractor in passing around
  situations.
 
  I don't think that anyone would wonder that an immutable MinOptMax would
  help FOP.
 
  This refactoring wouldn't be rocket science if all usages of MinOptMax
  would be covered by tests. I just started and found many such uncovered
  sections. I'm very new here and so I simply can't write such tests. So I
  ask you to possible write such tests or remove uncovered code sections.
 
  As for performance. I would opt for just refactoring all stuff to
  immutable MinOptMax and only introduce an MinOptMaxBuffer if really
  needed.
 
  With an immutable MinOptMax we can easily remove all TODO's inside
  MinOptMax. The integrity tests (min = opt = max) and we can remove the
  clone method, because it wouldn't be needed anymore.
 
  I just started the refactoring. All what I need are unit tests.
 
  Best Regards
  Alex
 
  
 
 



signature.asc
Description: This is a digitally signed message part


Re: Making MinOptMax Immutable

2009-10-27 Thread Alexander Kiel
Hi,

a issued a patch for MinOptMax:
https://issues.apache.org/bugzilla/show_bug.cgi?id=48071

Please read my first comment there and consider my patch :-)

Best Regards
Alex

On Sun, 2009-10-25 at 23:45 +0100, Alexander Kiel wrote:
 Hi,
 
 the class MinOptMax has some 800 usages in FOP. It holds a triple of
 values (min, opt, max) of length quantities. 
 
 It's heavily used during local computations and passing around. It's
 fields are public (whereas the class comment says they are only package
 visible). The public fields (and many methods) make MinOptMax mutable.
 This mutability is used in the computations for sheer performance
 reasons. But this mutability is a big bug attractor in passing around
 situations.
 
 I don't think that anyone would wonder that an immutable MinOptMax would
 help FOP.
 
 This refactoring wouldn't be rocket science if all usages of MinOptMax
 would be covered by tests. I just started and found many such uncovered
 sections. I'm very new here and so I simply can't write such tests. So I
 ask you to possible write such tests or remove uncovered code sections.
 
 As for performance. I would opt for just refactoring all stuff to
 immutable MinOptMax and only introduce an MinOptMaxBuffer if really
 needed.
 
 With an immutable MinOptMax we can easily remove all TODO's inside
 MinOptMax. The integrity tests (min = opt = max) and we can remove the
 clone method, because it wouldn't be needed anymore.
 
 I just started the refactoring. All what I need are unit tests.
 
 Best Regards
 Alex
 



signature.asc
Description: This is a digitally signed message part


Class comment of org.apache.fop.fo.properties.SpaceProperty misleading

2009-10-25 Thread Alexander Kiel
Hi,

The class comment of org.apache.fop.fo.properties.SpaceProperty says:

Base class used for handling properties of the fo:space-before and
fo:space-after variety. It is extended by
org.apache.fop.fo.properties.GenericSpace, which is extended by many
other properties.

But it isn't extended by GenericSpace and so isn't a base class. The
class GenericSpace doesn't exist. But there are some references to
GenericSpace in FOP.

Can someone which knows something about this, correct the class comment?

Thanks
Alex

-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Update qdox-1.6.3.jar to qdox-1.10.jar

2009-10-25 Thread Alexander Kiel
Hi,

can we update QDox from 1.6.3 to 1.10? 

I have an issue with QDox 1.6.3: It can't parse my Fixed16 class (which
I have attached to this mail). The problem is in line 35:

private static final float DENOMINATOR = (float) (1  14);

It can't parse the shift operator.

QDox 1.10 works fine.

Best Regards
Alex

-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *  http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/* $Id$ */

package org.apache.fop.fonts.opentype.common.io;

import java.text.NumberFormat;
import java.util.Locale;

/**
 * A 16-bit signed fixed-point number (2.14).
 */
public final class Fixed16 {

/**
 * This constant represents the fixed-point number zero.
 */
public static final Fixed16 ZERO = new Fixed16((short) 0);

private static final float DENOMINATOR = (float) (1  14);

private static final NumberFormat FORMAT = NumberFormat.getNumberInstance(Locale.US);

static {
FORMAT.setMinimumFractionDigits(1);
FORMAT.setMaximumFractionDigits(6);
}

private final short value;

/**
 * Creates a new fixed-point number.
 *
 * @param value the value of the fixed point number as short
 * @return a fixed-point number
 */
public static Fixed16 fromShort(short value) {
return new Fixed16(value);
}

/**
 * Creates a new fixed-point number.
 *
 * @param value the value of the fixed point number as float
 * @return a fixed-point number
 */
public static Fixed16 fromFloat(float value) {
int fixedValue = Math.round(value * DENOMINATOR);
if (fixedValue  Short.MAX_VALUE) {
throw new ArithmeticException(overflow; value:  + value);
}
if (fixedValue  Short.MIN_VALUE) {
throw new ArithmeticException(underflow; value:  + value);
}
return new Fixed16((short) fixedValue);
}

private Fixed16(short value) {
this.value = value;
}

/**
 * Retruns the internal short representation of this fixed-point number.
 *
 * @return the internal short representation of this fixed-point number.
 */
public short toShort() {
return value;
}

/**
 * Retruns the value of this fixed-point number as float.
 *
 * @return the value of this fixed-point number as float.
 */
public float toFloat() {
return value / DENOMINATOR;
}

/**
 * {...@inheritdoc}
 */
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null || getClass() != obj.getClass()) {
return false;
}

Fixed16 otherFixed = (Fixed16) obj;

return this.value == otherFixed.value;
}

/**
 * {...@inheritdoc}
 */
public int hashCode() {
return value;
}

/**
 * {...@inheritdoc}
 */
public String toString() {
return FORMAT.format(toFloat());
}
}


signature.asc
Description: This is a digitally signed message part


Problem in PageBreakingAlgorithm Constructor

2009-10-25 Thread Alexander Kiel
Hi,

the constructor of the class PageBreakingAlgorithm looks like this:

public PageBreakingAlgorithm(LayoutManager topLevelLM,
 PageProvider pageProvider,
 PageBreakingLayoutListener
layoutListener,
 int alignment, int alignmentLast,
 MinOptMax footnoteSeparatorLength,
 boolean partOverflowRecovery, boolean
autoHeight,
 boolean favorSinglePart) {
super(alignment, alignmentLast, true, partOverflowRecovery, 0);
this.topLevelLM = topLevelLM;
this.pageProvider = pageProvider;
this.layoutListener = layoutListener;
best = new BestPageRecords();
this.footnoteSeparatorLength = (MinOptMax)
footnoteSeparatorLength.clone();
// add some stretch, to avoid a restart for every page
containing footnotes
if (footnoteSeparatorLength.min == footnoteSeparatorLength.max)
{
footnoteSeparatorLength.max += 1;
}
this.autoHeight = autoHeight;
this.favorSinglePart = favorSinglePart;
}

The problem is the line:

footnoteSeparatorLength.max += 1;

I think it should read rather:

this.footnoteSeparatorLength.max += 1;

Clients calling the constructor shouldn't be happy about this situation.

I discovered this statement while refactoring the MinOptMax class into
an immutable one. I think this refactoring project should be another
mail. But this example shows how valuable a immutable MinOptMax would
be.

Can someone familiar with this part of FOP write a test which fails
against this current behavior? I could than use this test to verify that
my immutable MinOptMax works with this part.


Thanks
Alex

-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Making MinOptMax Immutable

2009-10-25 Thread Alexander Kiel
Hi,

the class MinOptMax has some 800 usages in FOP. It holds a triple of
values (min, opt, max) of length quantities. 

It's heavily used during local computations and passing around. It's
fields are public (whereas the class comment says they are only package
visible). The public fields (and many methods) make MinOptMax mutable.
This mutability is used in the computations for sheer performance
reasons. But this mutability is a big bug attractor in passing around
situations.

I don't think that anyone would wonder that an immutable MinOptMax would
help FOP.

This refactoring wouldn't be rocket science if all usages of MinOptMax
would be covered by tests. I just started and found many such uncovered
sections. I'm very new here and so I simply can't write such tests. So I
ask you to possible write such tests or remove uncovered code sections.

As for performance. I would opt for just refactoring all stuff to
immutable MinOptMax and only introduce an MinOptMaxBuffer if really
needed.

With an immutable MinOptMax we can easily remove all TODO's inside
MinOptMax. The integrity tests (min = opt = max) and we can remove the
clone method, because it wouldn't be needed anymore.

I just started the refactoring. All what I need are unit tests.

Best Regards
Alex

-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Re: NPE when using non-base14 font via IF XML

2009-10-07 Thread Alexander Kiel
Hi Vincent,

 To reproduce: put the config file at the root of a FOP local copy, then
 run the following:
 fop -c config.xconf test.fo -if if.xml
 fop -c config.xconf -ifin if.xml test.pdf

I would like to run your example this way, but there is no fop.sh. Is
there such a thing for the Linux guys or should I write one?

Best Regards
Alex



signature.asc
Description: This is a digitally signed message part


Re: NPE when using non-base14 font via IF XML

2009-10-07 Thread Alexander Kiel
Hi Vincent,

 The script is called just fop. Look at the root of the project, it’s
 actually a shell script.
 http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/fop?view=log

oh I was just blind. 

Thanks.

Alex



signature.asc
Description: This is a digitally signed message part


Re: NPE when using non-base14 font via IF XML

2009-10-07 Thread Alexander Kiel
Hi Jeremias,

 Alexander, that hints to a buggy XSLT processor. Please replace the
 Xalan coming with the JVM with the one bundled with FOP.
 http://xml.apache.org/xalan-j/faq.html#faq-N100EF

Thanks for this hint. I just use Java 1.4 not very often. With Java 1.6
it works.

Now I see the same NullPointerException. This null fontName comes from
line 264 in PDFPainter:

String fontKey = getFontInfo().getInternalFontKey(triplet);

The JavaDoc of getInternalFontKey() says, it can return null. But there
is no null check afterwards. But I have no idea, why the triplet is
unknown.

Best Regards
Alex

 
 On 07.10.2009 14:42:52 Alexander Kiel wrote:
  Hi Vincent,
  
  I get a completely different error. If I ran
  
 fop -c config.xconf test.fo -if if.xml
  
  There is no output - so it seems to run fine.
  
  If I run
  
  fop -c config.xconf -ifin if.xml test.pdf
  
  afterwards, I get:
  
  [Fatal Error] if.xml:4:12: The prefix x for element x:xmpmeta is not
  bound.
  Oct 7, 2009 2:37:23 PM org.apache.fop.cli.Main startFOP
  SEVERE: Exception
  javax.xml.transform.TransformerException: org.xml.sax.SAXParseException:
  The prefix x for element x:xmpmeta is not bound.
  at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:239)
  at org.apache.fop.cli.IFInputHandler.renderTo(IFInputHandler.java:77)
  at org.apache.fop.cli.Main.startFOP(Main.java:174)
  at org.apache.fop.cli.Main.main(Main.java:205)
  
  -
  
  javax.xml.transform.TransformerException: org.xml.sax.SAXParseException:
  The prefix x for element x:xmpmeta is not bound.
  at
  org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:469)
  at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:236)
  at org.apache.fop.cli.IFInputHandler.renderTo(IFInputHandler.java:77)
  at org.apache.fop.cli.Main.startFOP(Main.java:174)
  at org.apache.fop.cli.Main.main(Main.java:205)
  Caused by: org.xml.sax.SAXParseException: The prefix x for element
  x:xmpmeta is not bound.
  at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
  at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
  Source)
  at
  org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:452)
  ... 4 more
  -
  org.xml.sax.SAXParseException: The prefix x for element x:xmpmeta is
  not bound.
  at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
  at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
  Source)
  at
  org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:452)
  at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:236)
  at org.apache.fop.cli.IFInputHandler.renderTo(IFInputHandler.java:77)
  at org.apache.fop.cli.Main.startFOP(Main.java:174)
  at org.apache.fop.cli.Main.main(Main.java:205)
  
  I attached the if.xml. There is indeed no namespace declared for the x
  prefix.
  
  I'm inside a up-to-date trunk. FOP was build with Java 1.4.
  
  Best Regards
  Alex
  
  -  
  e-mail: alexanderk...@gmx.net
  web:www.alexanderkiel.net
  
  
  On Wed, 2009-10-07 at 12:16 +0100, Vincent Hennebert wrote:
   Hi,
   
   If I render the attached FO file into IF XML with the attached
   configuration file, then render the xml file into PDF, then I get the
   following error:
   SEVERE: Exception
   java.lang.NullPointerException: fontName must not be null
 at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:239)
 at org.apache.fop.cli.IFInputHandler.renderTo(IFInputHandler.java:77)
 at org.apache.fop.cli.Main.startFOP(Main.java:174)
 at org.apache.fop.cli.Main.main(Main.java:205)
   Caused by: java.lang.NullPointerException: fontName must not be null
 at org.apache.fop.render.pdf.PDFPainter.getTypeface(PDFPainter.java:246)
 at org.apache.fop.render.pdf.PDFPainter.drawText(PDFPainter.java:269)
 at
   org.apache.fop.render.intermediate.IFParser$Handler$TextHandler.endElement(IFParser.java:487)
 at
   org.apache.fop.render.intermediate.IFParser$Handler.endElement(IFParser.java:277)
 at
   org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1101)
 at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown 
   Source)
 at org.apache.xerces.xinclude.XIncludeHandler.endElement(Unknown Source)
 at 
   org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown 
   Source)
 at
   org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
   Source)
 at 
   org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
   Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source

Re: Checkstyle RedundantThrowsCheck

2009-10-02 Thread Alexander Kiel
Hi Max,

 DISCLAIMER: These comments are to be seen as purely academic, and may
 be complete overkill. For practical purposes, your code is just fine.

No, its ok, I like code reviews.

 - value is a very generic name, and could be reconsidered. What does the
 value actually specify? Looking at the detail, it is the int
 representation of the tag in little-endian. So I'd propose
 intRepresentation instead.

You are right, value is a bit to generic. The representation is actually
big-endian, the first byte of the array is the highest byte. So I should
really put this information in a comment. I'm even not sure why I've
chosen such a compact and difficult to understand representation. A
String would be better.

 - in your constructor, you use value to build up the intRepresentation.
 In this case, I'd use something like intValue

Here I would say that repeating the type in the variable name is not
needed. So the question would be why repeating int in
intRepresentation? Than one could say that the field should be really
named compactBigEndianIntegerRepresentation. But than this whole
concept of a compact big-endian integer representation of a String with
length of four and reduced ASCII charset should be really go into its
own class.

 - you have a static method  valueOf(String) and a constructor (byte[]).
 Why two different ways of initializing the class?

The valueOf(String) is the only public constructor. Its used all over
the font subsystem to create tags if needed. The package private
constructor is only used in the OpenTypeDataInputImpl. So here I have
the reading code of the data representation in the OpenType file (byte
array of length four) at the wrong place. It really needs to be moved
into the OpenTypeDataInputImpl. 

 - The constructor should be made private. If you really need to access
 the (byte[]) from within the package, you may provide a static public
 method for access.

Yes this byte[] constructor is a bit odd.

 - This class could be optimized using the flyweight pattern (e.g.
 caching the created objects)

Yes you are right, it could. But I really like to have ConcurrentHashMap
for such a task. So maybe I should wait until we switch to Java 1.5 or
can you recommend the ConcurrenthashMap from the backports JAR?

 - equals would be more readable if you rename tag to otherTag, and use
 this.value == otherTag.value

Yes, please blame Intellij Idea for that.

 - checkByte also uses value. In this case, you mean byteValue or
 charValue.

You are right!

 - why go with toChars creating an array and then using it?
 StringBuilder may be the easier solution.
 
 - in the alluppercase and alllowercase methods: You may consider
 using Character.isLetter rather than explicitly checking for space and
 numbers. Some characters, such as @ (although probably not used) would
 otherwise create bugs.

The problem is, that a digit is also considered as lowercase. In fact I
realized that this method should be named containsNoUpperCaseLetters.
I also changed the implementation to:

if (Character.isLetter(ch)  Character.isUpperCase(ch)) {
return false;
}

  Another example is the method getEntriesInOffsetOrder() in the attached
  file OffsetTable.java. It is just a getter of entries but it is named
  different.
 
 getEntriesInOffsetOrder returns a sorted version of the entries. So why
 not call the variable sortedEntries?

Because it is not sorted before calling Collections.sort(). If you read

List sortedEntries = getEntries();

you would expect, that getEntries() will return alright sorted entries.
The problem is, that Collections.sort() uses the output parameter
anti-pattern.

 Other notes:
 - getEntries does not return the entries attribute. This means you are
 confusing internal and external representation. getEntrieValues() could
 be a better name.

No entries is really a simple collection of enties (look at the
constructor). The Map is really a mapping from tag to entry. So I should
name it tagToEntryMap. Ahh and than the problem with the hidden variable
is also solved. Good point :-)

 - since the entries are re-ordered anyways when adding to the map, why
 not use a SortedMap (e.g. TreeMap instead)? Then one getEntries method
 would suffice.

Uhh. You spottet a bug. I need a LinkedHashMap here. Thanks! I really
like to have the entries in original order and in offset order.

 - you have some default visibility methods and classes, would should be
 reconsidered.

What is wrong with package local visibility? I find it very useful. In
fact, I think, its the most useful visibility right after private.

  I think this rule ist mostly helpful in order to think about variable
  names. But I also think that here are a few cases where violating this
  rule makes sense. So maybe the rule ist just not smart enough to detect
  the remaining special cases.
 
 If you are really sure you can always temporary disable CHECKSTLYE with
 
 // CHECKSTYLE:OFF
 violating code
 // CHECKSTYLE:ON

I that 

Re: Writing PDF Documents and other source code parts

2009-10-02 Thread Alexander Kiel
Hi Vincent,

 I can’t really help I’m afraid, as I personally don’t have the necessary
 knowledge. It’s probably time to submit what you already have as a patch
 attached to a Bugzilla entry:
 https://issues.apache.org/bugzilla/enter_bug.cgi?product=Fop
 That will allow us to have a look and maybe provide some additional
 guidance.

Okay. I'll work towards a round patch, which includes just the new
classes without integration. 

 How feasible would it be to write a thin layer on top of your library
 that would bridge the gap between it and the current one? That would be
 a temporary layer until the PDF code is in turn refactored, allowing you
 to keep the new library clean (do we really want write support for
 OpenType files??). Refactoring the PDF code now will lead you too far.
 Keep concentrated on fonts (as much as possible) for now.

It will be hard to write such a layer, but lets see. 

I think we need OpenType write support, because if we want to embedd
subsets of fonts, we need to manipulate the font program and write it
back into a byte stream. TTFSubSetFile does this already. From its class
comment:

Reads a TrueType file and generates a subset that can be used
to embed a TrueType CID font. TrueType tables needed for embedded
CID fonts are: head, hhea, loca, maxp, cvt , prep,
glyf, hmtx and fpgm.


 BTW, have you submitted your ICLA? 155 new classes... We’re gonna need
 one :-)

No not yet.

Best Regards
Alex

 Alexander Kiel wrote:
  Hi,
  
  I know my goal is to implement basic OpenType support for FOP. But from
  font subsetting/embedding my eyes touched the actual PDF output
  routines.
  
  I think, that this module needs refactoring. If you have a look at the
  PDFWritable interface, there is a really ugly method. The method
  outputInline takes an OutputStream and a Writer, which are related to
  each other. The comment says, that the writer is buffered and every time
  out want to write something to the OutputStream, you have to flush the
  Writer first. Thats crude.
  
  What is really needed is some output interface which is able to do both,
  write chars and write bytes.
  
  I had also a look at PDFBox regarding writing PDF's. Maybe we shouldn't
  refactor FOP's own, maybe a bit legacy PDF code. But I don't like PDFBox
  code either.
  
  So I'm a bit helpless now. The problem is, regardless of what code I
  see, let it be:
  
  TTFSubSetFile 
  
  Which is all about, reading a TrueType file, taking account of
  some glyph mapping (the glyphs used) and returning a byte array,
  which contains the bytes of a TrueType file with the subset of
  glyphs. This thing extends TTFFile which is about representing a
  TrueType file mixed with all the reading stuff. Here, reading,
  writing and representing some real world object is mixed in a
  really ugly way.
  
  PDFFactory
  
  This class does two things: creating and registering PDF objects.
  A factory should only create objects. Than this class has nearly
  1800 lines of code. Maybe it is a factory of to much things?
  
  If I look at the method which interests me makeFontFile the
  comment says: Embeds a font., but the method name is
  makeFontFile. makeFontFile makes sense in a factory. But
  Embeds a font. hints that this created font file is actually
  embedded in the PDF document. Than this method has nearly 100
  lines of code, which does all sorts of things that I can't
  understand fast. In some line the TTFSubSetFile is created and
  the resulting bytes go into some PDFTTFStream - okay.
  
  So do not wonder about memory problems. Here you have whole
  300 kb+ fonts sitting in arrays.
  
  MultiByteFont
  
  It seems to me that the MuliByteFont tracks the glyph usage. 
  getUsedGlyphs, mapChar, subSet. I always thought that
  fonts are immutable objects, representing a font program which
  can be used shared all over the application. Enjoy building
  a common font source in FOP!
  
  I don't know how I should integrate my own code into it. I think here is
  a lot of refactoring necessary in order to get the FOP parts into some
  state here I can integrate new code. 
  
  But I'm not sure where to start, not sure if here are enough tests. I
  don't know the overall structure. I'm simply a bit helpless.
  
  I have a nice fonts.opentype package here with 155 classes and 279 tests
  covering 93 % of the classes and 80 % of the lines. I can already read
  all of the TrueType metrics and OpenType kerning info. I have a class of
  every entity of the OpenType spec and a Reader for every such class.
  That means you can test reading every substructure alone. I think that
  this is a really nice API for reading OpenType files.
  
  So now as I saw what TTFSubSetFile really has to do, I will start adding
  write support for OpenType files. Than I will write some manipulation
  routine which can build a subset of a file

Re: Checkstyle RedundantThrowsCheck

2009-10-01 Thread Alexander Kiel
Hi Max,

 The same variable name should ALWAYS refer to the same variable / value.

I think, I can second this.

 For setters and constructors this makes sense - after all, in each of
 these you have a simple assignment, and both variables will carry the
 same value.

In my attachment Tag.java, you can see a variable named value in the
constuctor and as field. According the rule, the variable in the
constructor hides the field. But its really the same thing. I even
assign it in the last line of the constuctor. 

Another example is the method getEntriesInOffsetOrder() in the attached
file OffsetTable.java. It is just a getter of entries but it is named
different.

 But in most other methods, the parameter you pass is NOT assigned to the
 internal variable, so they actually refer to a different thing, and
 thats where the confusion starts.

You are absolutely right. In most cases the variable really refers to a
different thing. The above two examples are the only two cases where I
violated the HiddenFieldRule in 155 new classes.

 I know modern IDEs can show you which variable you actually refer to,
 but this usually requires selecting the variable or hovering over it,
 which you will not do if you are just reading the code trying to
 understand it.

Not in Intellij Idea. There fields are bold and dark magenta and all
other variables are just normal and black.

 However, since we cannot agree to keep the rule, I'll have to be content
 with removing it (which is already done).

I think this rule ist mostly helpful in order to think about variable
names. But I also think that here are a few cases where violating this
rule makes sense. So maybe the rule ist just not smart enough to detect
the remaining special cases.

Thats the same as with the Javadoc on public things rule. If there
isn't anything to say about a public thing which will say more than the
name itself, than I prefer no comment at all. But how should Checkstyle
detect such cases? 

There is a @SuppressWarnings annotation. I don't know if Checkstyle uses
it. So maybe if we switch to Java 1.5, we could use it. But even than
this annotation is a lot of clutter. It's a pity that computers can't
think.


Best Regards
Alex
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the License); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an AS IS BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/* $Id */

package org.apache.fop.fonts.opentype.common;

import java.io.UnsupportedEncodingException;

/**
 * Array of four uint8s (length = 32 bits) used to identify a script, language system, feature, or
 * baseline.
 * p/
 * Tags are the names given to tables in the OpenType font file. All tag names consist of four
 * characters. Names with less than four letters are allowed if followed by the necessary trailing
 * spaces. All tag names defined within a font (e.g., table names, feature tags, language tags) must
 * be built from printing characters represented by ASCII values 32-126.
 */
public final class Tag {

public static final Tag TTCF = Tag.valueOf(ttcf);
public static final Tag OTTO = Tag.valueOf(OTTO);
public static final Tag TRUE = Tag.valueOf(true);
public static final Tag TYP1 = Tag.valueOf(typ1);

private static final int MIN_BYTE_VALUE = 0x20;
private static final int MAX_BYTE_VALUE = 0x7E;

private final int value;

public static Tag valueOf(String str) {
int length = str.length();
if (length  4) {
throw new IllegalArgumentException(str.length()  4; was:  + length);
}
try {
return new Tag(str.getBytes(ISO-8859-1));
} catch (UnsupportedEncodingException e) {
//TODO: not the best solution
throw new InternalError(e.getMessage());
}
}

Tag(byte[] bytes) {
if (bytes.length  4) {
throw new IllegalArgumentException(bytes.length  4; was:  + bytes.length);
}
int value = 0;
for (int i = 0; i  4; i++) {
checkByte(bytes[i], i);
value += (bytes[i]  ((3 - i) * 8));
}
this.value = value;
}

public boolean isAllUpperCase() {
String tagStr = toString();
for (int i = 0; i  tagStr.length(); i++) {
char ch = tagStr.charAt(i);
if (!Character.isUpperCase(ch)  !Character.isDigit(ch)  

Re: Confused about checkstyle use

2009-10-01 Thread Alexander Kiel
Hi Vincent,

in Intellij Idea, I have also annoying yellow marks in my code. So if
the common policy is to not violate any warning, I won't do that.


Best Regards
Alex

-  
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net


On Thu, 2009-10-01 at 10:41 +0100, Vincent Hennebert wrote:
 Hi Alexander,
 
 Alexander Kiel wrote:
  Hi Vincent,
  
  Should the rule be disabled because of that? Having proper javadoc on at
  least public methods is very important. OTOH, this is actually not
  something Checkstyle can verify. How many methods in the code base have
  totally useless comments that are there just to avoid a Checkstyle
  warning...
 
  I think I’d prefer to keep the rule, but wouldn’t veto its removal.
  
  I don't vote for removal too, I only vote for the right to violate it in
  cases one can't add any useful information in the comment.
 
 Hmmm, I think that once we’ve agreed on a Checkstyle config we really
 want to follow, we won’t accept any warning at all. It was my intent to
 propose that anyway. I think it’s more annoying to have little yellow
 exclamation marks attached to every file that contains Checkstyle
 warnings (in Eclipse, at least), than have dull javadoc comments.
 
 
 Vincent
 
 


signature.asc
Description: This is a digitally signed message part


Javadoc Codestyle

2009-10-01 Thread Alexander Kiel
Hi,

do we use code, tt or {...@code}? I found all three version. Is there a
Checkstyle for that?

Do we introduce a newline between the Javadoc body and the @param,
@return or @throws clause? Again I see both:

/**
 * create the /Font object
 *
 * @param fontname the internal name for the font
 * @param subtype the font's subtype
 * @param basefont the base font name
 * @param encoding the character encoding schema used by the font
 */

/**
 * Sets the Encoding value of the font.
 * @param encoding the encoding
 */


Best Regards
Alex

-  
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Re: Javadoc Codestyle

2009-10-01 Thread Alexander Kiel
Hi Jeremias,

there is a JavadocStyleCheck. I have it includes in my
checkstyle-5.0.xml for testing. I use the standard settings. It tests
for empty JavaDoc and missing points in the first line.

I didn't found anything which checks for {...@code} instead of tt and
code. Can we add the use of {...@code} instead of tt and code to the
conventions page?


Best Regards
Alex


On Thu, 2009-10-01 at 15:37 +0200, Jeremias Maerki wrote:
 On 01.10.2009 15:08:55 Vincent Hennebert wrote:
  Hi Alexander,
  
  Alexander Kiel wrote:
   Hi,
   
   do we use code, tt or {...@code}? I found all three version. Is there 
   a
   Checkstyle for that?
  
  Use {...@code}. HTML tags should be avoided as much as possible.
  
  
   Do we introduce a newline between the Javadoc body and the @param,
   @return or @throws clause?
  
  Yes.
 
 I'm sure Vincent wanted to write Yes, that would be my preference..
 We, the project as a whole, have no such rule.
 
 Our code conventions are here:
 http://xmlgraphics.apache.org/fop/dev/conventions.html
 plus the Checkstyle configuration which has become a de-facto standard,
 you could say. Everything beyond that is personal preference.
 
 That said, I'm against over-regulating. Can you actually check that
 blank line in Checkstyle? I don't think so. Going beyond what we already
 have in terms of conventions doesn't make much sense as long as noone
 fixes each and every Checkstyle violation in FOP.
 
  
   Again I see both:
   
   /**
* create the /Font object
*
* @param fontname the internal name for the font
* @param subtype the font's subtype
* @param basefont the base font name
* @param encoding the character encoding schema used by the font
*/
   
   /**
* Sets the Encoding value of the font.
* @param encoding the encoding
*/
   
   
   Best Regards
   Alex
  
  Vincent
 
 
 
 
 Jeremias Maerki
 
 


signature.asc
Description: This is a digitally signed message part


Re: Javadoc Codestyle

2009-10-01 Thread Alexander Kiel
Hi Jeremias,

  there is a JavadocStyleCheck. I have it includes in my
  checkstyle-5.0.xml for testing. I use the standard settings. It tests
  for empty JavaDoc and missing points in the first line.
 
 Yes, but it can't check for that mandatory line between the body and the
 parameters, right?

Yes.

  I didn't found anything which checks for {...@code} instead of tt and
  code. Can we add the use of {...@code} instead of tt and code to the
  conventions page?
 
 It took me some searching to find out where that {...@code} is even
 specified. It seems to be a Javadoc 1.5 feature:
 http://java.sun.com/javase/6/docs/technotes/tools/windows/javadoc.ht...@code
 Please note that we're technically still on Java 1.4. A {...@code} results
 in a warning and its content is swallowed with Javadoc 1.4.
 
 Sun's styleguide still lists code for keywords and names:
 http://java.sun.com/j2se/javadoc/writingdoccomments/#styleguide

But I'm sure that the majority of developers is migrating to {...@code} or
generally speaking away from HTML. One advantage except from not
relaying on HTML is, that inside code you can use any character such as
,  and  which isn't allowed in HTML or XML.

 While researching I found that I seem to have used @code accidentally in
 some places but with the actual intention of using @link (see
 IFDocumentHandler, for example). What a mess. :-( I'll fix that.

Oh I see :-) So there are currently 105 usages of {...@code} in the trunk.
What to do? Can't we allow {...@code} if we generate Javadoc with Java 1.5?


Best Regards
Alex


signature.asc
Description: This is a digitally signed message part


Re: Javadoc Codestyle

2009-10-01 Thread Alexander Kiel
Hi Jeremias,

   While researching I found that I seem to have used @code accidentally in
   some places but with the actual intention of using @link (see
   IFDocumentHandler, for example). What a mess. :-( I'll fix that.
 
 I've already fixed the obvious mistakes. 

There are only 23 other usages in tunk. So maybe you can replace them
all with code.

And there are also only 57 usages of tt. So maybe with regexp and
reformat code...

  Oh I see :-) So there are currently 105 usages of {...@code} in the trunk.
  What to do? Can't we allow {...@code} if we generate Javadoc with Java 1.5?
 
 I guess it's no big deal. If people want clean javadocs they need to run
 Javadoc 1.5. But IMO it's a bit premature to require {...@code} in our
 conventions.

Hmm. Okay I see the point. The switch from 1.4 to 1.5 have to be very
clean. And it would be great if it would come soon. :-)

Best Regards
Alex


signature.asc
Description: This is a digitally signed message part


Writing PDF Documents and other source code parts

2009-10-01 Thread Alexander Kiel
Hi,

I know my goal is to implement basic OpenType support for FOP. But from
font subsetting/embedding my eyes touched the actual PDF output
routines.

I think, that this module needs refactoring. If you have a look at the
PDFWritable interface, there is a really ugly method. The method
outputInline takes an OutputStream and a Writer, which are related to
each other. The comment says, that the writer is buffered and every time
out want to write something to the OutputStream, you have to flush the
Writer first. Thats crude.

What is really needed is some output interface which is able to do both,
write chars and write bytes.

I had also a look at PDFBox regarding writing PDF's. Maybe we shouldn't
refactor FOP's own, maybe a bit legacy PDF code. But I don't like PDFBox
code either.

So I'm a bit helpless now. The problem is, regardless of what code I
see, let it be:

TTFSubSetFile 

Which is all about, reading a TrueType file, taking account of
some glyph mapping (the glyphs used) and returning a byte array,
which contains the bytes of a TrueType file with the subset of
glyphs. This thing extends TTFFile which is about representing a
TrueType file mixed with all the reading stuff. Here, reading,
writing and representing some real world object is mixed in a
really ugly way.

PDFFactory

This class does two things: creating and registering PDF objects.
A factory should only create objects. Than this class has nearly
1800 lines of code. Maybe it is a factory of to much things?

If I look at the method which interests me makeFontFile the
comment says: Embeds a font., but the method name is
makeFontFile. makeFontFile makes sense in a factory. But
Embeds a font. hints that this created font file is actually
embedded in the PDF document. Than this method has nearly 100
lines of code, which does all sorts of things that I can't
understand fast. In some line the TTFSubSetFile is created and
the resulting bytes go into some PDFTTFStream - okay.

So do not wonder about memory problems. Here you have whole
300 kb+ fonts sitting in arrays.

MultiByteFont

It seems to me that the MuliByteFont tracks the glyph usage. 
getUsedGlyphs, mapChar, subSet. I always thought that
fonts are immutable objects, representing a font program which
can be used shared all over the application. Enjoy building
a common font source in FOP!

I don't know how I should integrate my own code into it. I think here is
a lot of refactoring necessary in order to get the FOP parts into some
state here I can integrate new code. 

But I'm not sure where to start, not sure if here are enough tests. I
don't know the overall structure. I'm simply a bit helpless.

I have a nice fonts.opentype package here with 155 classes and 279 tests
covering 93 % of the classes and 80 % of the lines. I can already read
all of the TrueType metrics and OpenType kerning info. I have a class of
every entity of the OpenType spec and a Reader for every such class.
That means you can test reading every substructure alone. I think that
this is a really nice API for reading OpenType files.

So now as I saw what TTFSubSetFile really has to do, I will start adding
write support for OpenType files. Than I will write some manipulation
routine which can build a subset of a file. But I don't like so get the
glyph mapping info for this manipulation from a MultiByteFont which
should be really immutable.

I found it sufficient to write a KerningMapBuilder which stuffs kerning
pairs into a really nice double nested Map construction. As the comment
on CustomFont#replaceKerningMap says:

the kerning map (MapInteger, MapInteger, Integer, 
the integers are character codes)

Such a high specialized, self explaining, problem-oriented data
structure is spread all over the font system. Know your tools!

So where to start?

Best Regards
Alex

-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Re: Best Interface for reading OpenType Files

2009-09-30 Thread Alexander Kiel
Hi Vincent,

 I see. I had in mind to use OpenTypeDataInputStream as the common
 interface. It actually makes sense to use ImageInputStream instead.
 Simpler and just as flexible. That will add a direct dependency on
 a class in the javax.imageio package, but this is not a problem as it is
 part of the standard library. That ImageInputStream interface is
 unfortunately named really.

What did you mean with your last sentence? That ImageInputStream isn't
named good?

  So if I should vote, it would properly vote for spring.
 
 Well I’m not sure I like the abundance of XML in spring actually. POJOs
 powaaa! Also, spring may be overkill to just deploy FOP. Anyway, this is
 probably a bit early to discuss that. (What do you think of the
 following though: http://code.google.com/p/google-guice/ ?)

I heard of it before, but didn't inform myself about it. So I took your
pointer as motivation to have a look at it. I watched the Google I/O -
Big Modular Java with Guice [1] talk on youtube. It looks very
promising. I'm not agains this XML config stuff, but if I can get the
same with annotations and standard Java code - why not. Of course I like
this whole type safety stuff, but with Intellij I get this in Spring XML
too. 

[1]: http://www.youtube.com/watch?v=hBVJbzAagfs

  - does the use of serializable objects make sense? What would be more
efficient: re-parsing font data all the time or re-loading
serializable object representation of them?
  You mean the font metrics XML files? I've alwas asking me for what
  propose they are there. No, I don't think, we need this. I really don't
  want to serialize the Advanced OpenType Features! It took me already a
  good amount of code to parse just a bit of it.
  What I meant was to use the java.io.Serializable interface. I don’t
  indeed think XML representations are any useful, apart maybe for
  debugging purpose or to have a more human-readable version of the font
  file.
  IIC there would be next to nothing to do to cache Serializable objects
  on the hard drive and retrieve them?
  
  Hmmm. Ok. But if we want to use Serializable for that, your classes have
  to be very stable. Versioning the Serializable stuff is a real burden in
  my opinion. So we will need a cache which detects version changes and
  invalidate the objects if so. Do you know such a lib?
 
 I was thinking that just catching the InvalidClassException when reading
 the object would be enough to conclude that the cache is no longer valid
 and must be re-created. Maybe I’m wrong? I must confess that I have no
 experience with serialization.

Yes this could work. But I find it always difficult and time consuming
to design classes for serialization. And reading the serialized version
is most likely not much faster than reading the actual OpenType file. So
I would really want to wait until we have a real performance problem.

Best Regards
Alex



signature.asc
Description: This is a digitally signed message part


Re: Confused about checkstyle use

2009-09-30 Thread Alexander Kiel
Hi Max,

First, I will respect every code style of FOP. Its just a matter of
discussion.

  Really? That means commenting every public method even simple Getters
  and Setters?
 
 Yes. Simple Getter and Setters are the only place where you can
 publicly document private variables. (in most cases, comment in the
 getter and link from the setter)

Yes thats right. But is this Javadoc better than no Javadoc?

public class Person {

/**
 * Returns the first name of this person.
 *
 * @returns the first name of this person.
 */
public String getFirstName() {
return firstName;
}
}

  Commenting equals(), hashCode() and toString()? I think,
  this would be only clutter.
 
 /** {...@inheritdoc} */

In my eyes this is enough clutter. I saw classes in FOP with maybe 10
methods using this /** {...@inheritdoc} */. It just distracts the eye from
ready the actual method name. And it adds absolutely no information for
the source code reader.

 would do the trick on those,  UNLESS they implement something which is
 unexpected (such as the equals methods I recently renamed which did
 not implement equals) or special (a toString which creates a
 guaranteed parsable result for example)

Hmmm. A equals method shouldn't do anything unexpected. But your
toString() example is a good one. If such standard methods do something
more as the comment in Object says, that a comment is useful. 

I think it's the same as on simple public methods like the getter from
above. If your comment doesn't say anything more than the method name
says already, I don't want to read it.

Best Regards
Alex



signature.asc
Description: This is a digitally signed message part


Re: Checkstyle RedundantThrowsCheck

2009-09-30 Thread Alexander Kiel
Hi Vincent,

 Speaking of that, there’s a rule that I would suggest to disable: the
 HiddenFieldCheck. I don’t really see its benefit. It forces to find
 somewhat artificial names for variables, where the field name is exactly
 what I want. Sometimes a method doesn’t have a name following the
 setField pattern, yet still acts as a setter for Field. This rule would
 make sense if we were using a Hungarian-like notation for variables
 (mMember, pParam, etc.), but that’s not the case in FOP.
 
 WDYT?

Yes I would vote for it. In modern IDE's one sees clearly the difference
between an instance field and a local variable. This is also the reason
why this Hungarian-like scope notation is largely gone in Java.


Best Regards
Alex


signature.asc
Description: This is a digitally signed message part


Re: Checkstyle RedundantThrowsCheck

2009-09-30 Thread Alexander Kiel
Hi Max,

  Speaking of that, there’s a rule that I would suggest to disable: the
  HiddenFieldCheck. I don’t really see its benefit. It forces to find
  somewhat artificial names for variables, where the field name is exactly
  what I want. Sometimes a method doesn’t have a name following the
  setField pattern, yet still acts as a setter for Field. This rule would
  make sense if we were using a Hungarian-like notation for variables
  (mMember, pParam, etc.), but that’s not the case in FOP.
  WDYT?
 
 I like the rule, BUT I am ok with an exception for setters and
 constructors (this is IMO a new option in checkstyle 5):
 http://checkstyle.sourceforge.net/config_coding.html#HiddenField

The exclusion of constructors an setters is important. Otherwise we
would be forced to use some Hungarian-like scope notation.

But why do you think, that this rule is useful at all?

Best Regards
Alex


signature.asc
Description: This is a digitally signed message part


Re: Confused about checkstyle use

2009-09-30 Thread Alexander Kiel
Hi Vincent,

 Should the rule be disabled because of that? Having proper javadoc on at
 least public methods is very important. OTOH, this is actually not
 something Checkstyle can verify. How many methods in the code base have
 totally useless comments that are there just to avoid a Checkstyle
 warning...
 
 I think I’d prefer to keep the rule, but wouldn’t veto its removal.

I don't vote for removal too, I only vote for the right to violate it in
cases one can't add any useful information in the comment.


Best Regards
Alex


signature.asc
Description: This is a digitally signed message part


Re: Confused about checkstyle use

2009-09-28 Thread Alexander Kiel
Hi Vincent,

 However, new committed code is not supposed to break any rule, neither
 warnings nor errors.

Really? That means commenting every public method even simple Getters
and Setters? Commenting equals(), hashCode() and toString()? I think,
this would be only clutter.


Best Regards
Alex


signature.asc
Description: This is a digitally signed message part


RE: Confused about checkstyle use

2009-09-28 Thread Alexander Kiel
Hi Jonathan,

 However, I notice there are still warnings.
 
 BlockStackingLayoutManager.java: 16 items
 
 Missing a Javadoc comment. (58:5)
 'parentArea' hides a field. (115:47)
 'parentArea' hides a field. (145:50)
 Method length is 185 lines (max allowed is 150) (372:5)
 Etc.,

At BlockStackingLayoutManager my right side is almost completely
yellow :-) In German, I would say: Monsterklasse. :-)

 I'm using JetBrains IDEA 8.1.3.

I too.

 BTW, I got Checkstyle to work in IDEA by changing checkstyle-5.0.xml in
 FOP in the following way:
 
   module name=RegexpHeader
 property name=headerFile
 value=c:/perforce/Users/levinson/fop-trunk/checkstyle.header/

Yes, this solution is obvious, but not very suitable as you can't commit
this file with your private path.

Can someone of the older project members point us to some info, why this
${samedir} property did not work in IDEA?


Best Regards
Alex


 -Original Message-
 From: Alexander Kiel [mailto:alexanderk...@gmx.net] 
 Sent: Sunday, September 27, 2009 4:55 PM
 To: fop-dev@xmlgraphics.apache.org
 Subject: Re: Confused about checkstyle use
 
 Hi Jonathan,
 
 did you use the checkstyle-5.0.xml from FOP or the default SUN profile? 
 I'm currently not able to start IDEA, but two days ago as I downloaded 
 the plugin, I noticed that the SUn profile was active and I had to 
 define the FOP profile. And if you define the FOP profile, you will 
 properly notice that the header thing did not work. Its a path inclusion
 
 problem of the header.* file. I did not have a solution for it, I just 
 commended it out for now.
 
 Best Regards
 Alex
 
 Jonathan Levinson wrote:
 
  I've installed the Checkstyle plugin for IDEA and the current code 
  when scanned by the plugin shows lots of Checkstyle errors.
 
  Here are some errors scanning BlockStackingLayoutManager.java:
 
  Missing package-info.java file (0:0)
 
  Line is longer than 80 characters. (18:0)
 
  First sentence should end with a period (53:0)
 
  Variable 'bpUnit' must be private and have accessor methods. (61:19)
 
  What does it mean to have clean code according to Checkstyle?
 
  Is my plugin misconfigured? Is it by default at too strict a setting?
 
  Best Regards,
 
  Jonathan S. Levinson
 
  Senior Software Developer
 
  Object Group
 
  InterSystems
 
 
 


signature.asc
Description: This is a digitally signed message part


Re: Best Interface for reading OpenType Files

2009-09-28 Thread Alexander Kiel
Hi Vincent,

  Here are my two cents: if you make use of classes in javax.imagio at
  only one place in your font library, then there’s no need to worry about
  creating a more neutral layer. If OTOH you need to use those classes
  everywhere, then it makes sense to use a simplified abstraction layer.
  That abstraction layer could be shipped as a separate module and evolve
  separately. An implementation could be based on imageIO, Apache Commons
  IO (?), your own implementation based on byte arrays for testing
  purpose, etc.
  
  Thanks for that. I think, I will write a OpenTypeDataInputStream which
  is not a FilterInputStream, but takes a ImageInputStream as constructor
  argument like a FilterInputStream would take a InputStream. This
  OpenTypeDataInputStream will be the API for all the Streams on top of
  it. So I would have only one point which depends on ImageInputStream.
 
 You may want to use a factory a la SAXParserFactory. Although that may
 go a bit far.

Hmmm. I don't see the benefit of such a factory here. The
OpenTypeDataInputStream would look like this:

public class OpenTypeDataInputStream {

private final ImageInputStream in;

public OpenTypeDataInputStream(ImageInputStream in) {
this.in = in;
}

public final int readUnsignedShort() throws IOException {
[...]
}

public final Tag readTag() throws IOException {
[...]
}

}

This is the common FilterInputStream pattern. OpenTypeDataInputStream
only depends on ImageInputStream which is an interface.
OpenTypeDataInputStream is really simple and straitforward, so that I
can't imagine different implementations. Except implementations on top
of other things as ImageInputStream. But than we are at the question, if
we want ImageInputStream the common interface for different
implementations (on top of files, streams, byte arrays) or if we want
OpenTypeDataInputStream to do that. I think that ImageInputStream is the
right place, because it abstracts from getting bytes and be able to
seek. OpenTypeDataInputStream on the other hand implements the semantics
of the common OpenType data types, which are well defined in the
specification.

  If you only need the metrics, parsing the glyf or CFF table would be
  really unnecessary. So maybe a TableFilter interface would be useful.
  Like this:
  
  public class OpenTypeFileInputStream {
  
  private TableFilter tableFilter = TableFilter.NO_FILTERING;
  
  public OpenTypeFileInputStream(OpenTypeDataInputStream in) {}
  
  public void setTableFilter(TableFilter tableFilter) {}
  }
  
  public interface TableFilter {
  
  public static final TableFilter NO_FILTERING = new TableFilter() {
  public doReadTable(Tag tableTag) { return true; }
  }
  
  boolean doReadTable(Tag tableTag);
  }
  
  A client which isn't aware of TableFilter would not notice any burden
  using the API. And the implementation in OpenTypeFileInputStream isn't
  so difficult.
 
 This is an interesting idea. But how would you combine filters?
 I’d suggest to keep it aside for the moment, and implement it if we are
 actually running into performance issues. After all, if some caching is
 done, the font should be parsed only once.

The idea of TableFilter is borrowed from java.io.FileFilter. If you look
at org.apache.commons.io.filefilter.AndFileFilter and so on, you get an
Idea how one could combine such filters.

Sure we had to implement some sort of dependencies between tables, if we
want to save the user from surprises.

 There’s no such thing as IoC container in FOP. I’m not sure how easy it
 would be to introduce one. Although that would probably be A Good Thing.
 So do design your font library with IoC in mind.

Yes, I will. We can use IoC even without a container. And if we want to
choose one, I have plenty experience with spring. So if I should vote,
it would properly vote for spring.

  - does the use of serializable objects make sense? What would be more
efficient: re-parsing font data all the time or re-loading
serializable object representation of them?
  
  You mean the font metrics XML files? I've alwas asking me for what
  propose they are there. No, I don't think, we need this. I really don't
  want to serialize the Advanced OpenType Features! It took me already a
  good amount of code to parse just a bit of it.
 
 What I meant was to use the java.io.Serializable interface. I don’t
 indeed think XML representations are any useful, apart maybe for
 debugging purpose or to have a more human-readable version of the font
 file.
 IIC there would be next to nothing to do to cache Serializable objects
 on the hard drive and retrieve them?

Hmmm. Ok. But if we want to use Serializable for that, your classes have
to be very stable. Versioning the Serializable stuff is a real burden in
my opinion. So we will need a cache which detects version changes and
invalidate the objects if so. Do you know such a lib?


Best Regards
Alex



Re: Checkstyle RedundantThrowsCheck

2009-09-27 Thread Alexander Kiel
Hi Jeremias,

 Makes sense. I stumbled over that myself from time to time but it didn't
 really bother me so much to take action.

Okay. Can you please modify the checkstyle XML files to reflect that?
I'm a great fan of that checkstyle stuff. I didn't use it before, but I
find a common coding style important for such a big and shared project
like FOP.

What's about severities? Did you commit code with checkstyle errors? 

Best Regards
Alex

 On 26.09.2009 14:41:37 Alexander Kiel wrote:
  Hi,
  
  why didn't our code style allow unchecked exceptions or subclasses of
  thrown exceptions in Javadoc?
  
  From checkstyle-5.0.xml:
  
  module name=RedundantThrowsCheck
  property name=allowSubclasses value=false/
  property name=allowUnchecked value=false/
  property name=severity value=warning/
  /module
  
  From J. Bloch: Effective Java, Second Edition [1] page 252:
  
  Use the Javadoc @thows tag to document each unchecked exception
  that a method can throw, but do not use the throws keyword to
  include unchecked exceptions in the method declaration.
  
  Every good code I know, documents unchecked exceptions. Take the Java
  Collections API. Every possible ClassCastException or
  NullPointerException is documented.
  
  Another quote from J. Bloch:
  
  A well-documented list of unchecked exceptions that a method
  can throw effectively describes the preconditions for its
  successful execution. It is essential that each method's
  documentation describe its preconditions [...]
  
  I think that everyone can agree with the statements J. Bloch made. So I
  would strongly vote to allow documenting unchecked exceptions.
  
  
  The second point is not allowing subclasses of exceptions in Javadoc. I
  don't use this very often, but I have just one example in my mind where
  this makes sense. If you have a look into
  java.io.DataInputStream#readByte(), there are both IOException and
  EOFException documented. EOFException is a subclass of IOException. As
  you know a normal InputStream.read() returns -1 at EOF but readByte()
  doesn't. So it's worth documenting that readByte() is throwing a
  EOFException instead.
  
  So I would also vote allowing subclasses.
  
  
  Best Regards
  Alex
  
  [1]: http://www.amazon.com/dp/0321356683/
  
  -- 
  e-mail: alexanderk...@gmx.net
  web:www.alexanderkiel.net
  
 
 
 
 
 Jeremias Maerki
 
 



signature.asc
Description: This is a digitally signed message part


Re: Unit Tests in Intellij Idea

2009-09-27 Thread Alexander Kiel
Hi Jeremias,

thanks for that. I just changed the Idea output dirs to match that from
Ant and now almost all tests run.

Best Regards
Alex

On Sat, 2009-09-26 at 23:05 +0200, Jeremias Maerki wrote:
 I assume the META-INF/services directory doesn't get copied over to the
 place where IDEA places its compiled Java classes. If they are not
 available through the classpath, no plug-ins are registered and the PDF
 renderer is basically just a dynamically loaded plug-in.
 
 Not sure about the hyphenation problem. It's not happening here.
 
 On 25.09.2009 19:27:36 Alexander Kiel wrote:
  Hi,
  
  I search for a possibility to run the unit tests just inside idea.
  
  If I let idea compile all the classes, coping the fo files and letting
  ant generate the source files, it runs 2110 test, but fails 1737 of
  them.
  
  Some of the errors:
  
  No IF document handler for the requested format available:
  application/pdf
  
  Don't know how to handle application/pdf as an output format. Neither
  an FOEventHandler, nor a Renderer could be found for this output format.
  
  The file format is not supported. No ImagePreloader found for
  test/resources/images/img-w-size.svg
  
  Event model doesn't contain the definition for
  org.apache.fop.fo.FOValidationEventProducer
  
  Class org.apache.fop.intermediate.IFParserTestCase has no public
  constructor TestCase(String name) or TestCase()
  
  No IF document handler for the requested format available:
  application/postscript
  
  -
  
  Looks like configuration issues to me. So what can I do to be able to
  run the unit tests?
  
  
  One other related problem: I don't get the hyphenation unit tests to run
  with Ant. I have the fop-hyph-1.2.jar in the classpath. I use Java 1.4,
  JUnit 3.8 and Ant 1.7.
  
  If I run junit-layout-hypenation, I get 5 errors out of 8 tests. I have
  attached the ant output.
  
  
  Best Regards
  Alex
  
  
  -- 
  e-mail: alexanderk...@gmx.net
  web:www.alexanderkiel.net
  
 
 
 
 
 Jeremias Maerki
 



signature.asc
Description: This is a digitally signed message part


Re: Checkstyle RedundantThrowsCheck

2009-09-27 Thread Alexander Kiel

Hi Jeremias,


Makes sense. I stumbled over that myself from time to time but it didn't
really bother me so much to take action.
  

Okay. Can you please modify the checkstyle XML files to reflect that?



Sure, but only after a period of at least 72 hours to allow the other
committers to raise an objection.
  

Of course.

I'm a great fan of that checkstyle stuff. I didn't use it before, but I
find a common coding style important for such a big and shared project
like FOP.

What's about severities? Did you commit code with checkstyle errors? 



No, I always fix errors (mine or others'). Sometimes tab characters
creep in, for example. The Checkstyle plug-in for Eclipse is really
helpful in that department. If I didn't fix Checkstyle errors I might
not notice any build failures prior to a commit.
  
I find it very comfortable to hopefully contribute something real useful 
in such a great project. Please don't understand me in a wrong sense, as 
I sometimes be a bit to harsh. Its only my nature that I like to be precise.



Best Regards
Alex



Re: Confused about checkstyle use

2009-09-27 Thread Alexander Kiel

Hi Jonathan,

did you use the checkstyle-5.0.xml from FOP or the default SUN profile? 
I'm currently not able to start IDEA, but two days ago as I downloaded 
the plugin, I noticed that the SUn profile was active and I had to 
define the FOP profile. And if you define the FOP profile, you will 
properly notice that the header thing did not work. Its a path inclusion 
problem of the header.* file. I did not have a solution for it, I just 
commended it out for now.


Best Regards
Alex

Jonathan Levinson wrote:


I’ve installed the Checkstyle plugin for IDEA and the current code 
when scanned by the plugin shows lots of Checkstyle errors.


Here are some errors scanning BlockStackingLayoutManager.java:

Missing package-info.java file (0:0)

Line is longer than 80 characters. (18:0)

First sentence should end with a period (53:0)

Variable ‘bpUnit’ must be private and have accessor methods. (61:19)

What does it mean to have clean code according to “Checkstyle”?

Is my plugin misconfigured? Is it by default at too strict a setting?

Best Regards,

Jonathan S. Levinson

Senior Software Developer

Object Group

InterSystems





Checkstyle RedundantThrowsCheck

2009-09-26 Thread Alexander Kiel
Hi,

why didn't our code style allow unchecked exceptions or subclasses of
thrown exceptions in Javadoc?

From checkstyle-5.0.xml:

module name=RedundantThrowsCheck
property name=allowSubclasses value=false/
property name=allowUnchecked value=false/
property name=severity value=warning/
/module

From J. Bloch: Effective Java, Second Edition [1] page 252:

Use the Javadoc @thows tag to document each unchecked exception
that a method can throw, but do not use the throws keyword to
include unchecked exceptions in the method declaration.

Every good code I know, documents unchecked exceptions. Take the Java
Collections API. Every possible ClassCastException or
NullPointerException is documented.

Another quote from J. Bloch:

A well-documented list of unchecked exceptions that a method
can throw effectively describes the preconditions for its
successful execution. It is essential that each method's
documentation describe its preconditions [...]

I think that everyone can agree with the statements J. Bloch made. So I
would strongly vote to allow documenting unchecked exceptions.


The second point is not allowing subclasses of exceptions in Javadoc. I
don't use this very often, but I have just one example in my mind where
this makes sense. If you have a look into
java.io.DataInputStream#readByte(), there are both IOException and
EOFException documented. EOFException is a subclass of IOException. As
you know a normal InputStream.read() returns -1 at EOF but readByte()
doesn't. So it's worth documenting that readByte() is throwing a
EOFException instead.

So I would also vote allowing subclasses.


Best Regards
Alex

[1]: http://www.amazon.com/dp/0321356683/

-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Re: Best Interface for reading OpenType Files

2009-09-25 Thread Alexander Kiel
Hi Jeremias,

On Fri, 2009-09-25 at 08:37 +0200, Jeremias Maerki wrote:
 I don't think that relying directly on the ImageIO API is a problem
 since it's been part of the core Java class library since Java 1.4. It's
 available in all JVMs that claim to be at least Java 1.4 compliant. I
 don't really see the benefit in hiding the API behind an additional
 layer. ImageIO is here to stay. But that's just my opinion.
 
 Please note that SeekableStream is a predecessor of the ImageIO
 ImageInputStream as the image codecs in XML Graphics Commons originally
 came from JAI via Batik. It's not something we built specifically for
 our project here.

I had a look at SeekableStream and I can imagine how the needs resulted
in the ImageInputStream interface. I haven't decided yet if I should use
ImageInputStream directly. Maybe someone else can throw it's two cents
in here.

 An inquiry on fop-users [1] reminded me to just briefly mention an
 important point about the font subsystem: the fact that some font data
 is loaded again and again for each rendering run. We've discussed this 
 (and possible solution approaches: font sources) in the past (see
 mailing list archives, particularly [2]). Unfortunately, this hasn't
 been realized, yet. Some improvements were made in the last couple of
 years, but we're not quite there, yet. So I'm happy that you've started
 working in this area. This will surely be at least a big step in the
 right direction.
 
 [1] http://markmail.org/thread/r6etkcadyaahgyhe
 [2] http://markmail.org/message/4cmbj5x3zkvflrax

I read the FOPFontSubsystemDesign [1] wiki page. At the moment I don't
understand the whole system good enough to see whats needed by the rest
of FOP. I think a more deeply discussion about the font subsystem would
be out of this discussions subject. So maybe we should start a new
thread on the list. But before this, I should get my OpenType reading
finished and submit the patch.

Best Regards
Alex

[1] http://wiki.apache.org/xmlgraphics-fop/FOPFontSubsystemDesign


e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Re: Best Interface for reading OpenType Files

2009-09-25 Thread Alexander Kiel
Hi Vincent,

  I had a look at SeekableStream and I can imagine how the needs resulted
  in the ImageInputStream interface. I haven't decided yet if I should use
  ImageInputStream directly. Maybe someone else can throw it's two cents
  in here.
 
 Here are my two cents: if you make use of classes in javax.imagio at
 only one place in your font library, then there’s no need to worry about
 creating a more neutral layer. If OTOH you need to use those classes
 everywhere, then it makes sense to use a simplified abstraction layer.
 That abstraction layer could be shipped as a separate module and evolve
 separately. An implementation could be based on imageIO, Apache Commons
 IO (?), your own implementation based on byte arrays for testing
 purpose, etc.

Thanks for that. I think, I will write a OpenTypeDataInputStream which
is not a FilterInputStream, but takes a ImageInputStream as constructor
argument like a FilterInputStream would take a InputStream. This
OpenTypeDataInputStream will be the API for all the Streams on top of
it. So I would have only one point which depends on ImageInputStream.

 And another bunch of thoughts and questions:
 - I think priority should be given to having a sound API that can be
   re-used by other projects than FOP, rather than memory optimization.

Agree.

 - is memory consumption that much of a problem anyway? I mean, fonts are
   intrinsically big, complex objects and there’s not much we can do
   about that. Many scripts in the world can’t do without advanced
   features. Making the parsing of some tables optional doesn’t look to
   me like the right way to optimise things. That would unnecessarily
   complicate the code.

If you only need the metrics, parsing the glyf or CFF table would be
really unnecessary. So maybe a TableFilter interface would be useful.
Like this:

public class OpenTypeFileInputStream {

private TableFilter tableFilter = TableFilter.NO_FILTERING;

public OpenTypeFileInputStream(OpenTypeDataInputStream in) {}

public void setTableFilter(TableFilter tableFilter) {}
}

public interface TableFilter {

public static final TableFilter NO_FILTERING = new TableFilter() {
public doReadTable(Tag tableTag) { return true; }
}

boolean doReadTable(Tag tableTag);
}

A client which isn't aware of TableFilter would not notice any burden
using the API. And the implementation in OpenTypeFileInputStream isn't
so difficult.

 - instead of seekable streams, what about a filter that would re-order
   the font stream, caching whatever is necessary before re-sending it to
   the consumer object?

I don't want to do this. In the OpenType GPOS and GSUB tables you have
maybe 5 levels of nested structures with headers and offsets. It gets
really complex there.

 - what about giving the font library a “playground” directory by
   inversion of control, that it can use to cache things? And if no
   directory is given it would use the memory. Maybe a common interface
   could be used for that, targeting either the hard drive or the memory.

Sure. By the way - is there any IoC container used in FOP? I did not see
one so far. How is the bootstrapping done? This could be important for a
central FontSource or such thing.

 - does the use of serializable objects make sense? What would be more
   efficient: re-parsing font data all the time or re-loading
   serializable object representation of them?

You mean the font metrics XML files? I've alwas asking me for what
propose they are there. No, I don't think, we need this. I really don't
want to serialize the Advanced OpenType Features! It took me already a
good amount of code to parse just a bit of it.

 - what about looking at how fontconfig [1] (a font configuration library
   for Linux systems) does things? I know it makes use of a cache to
   speed up things. Maybe there are good ideas to borrow from there.
 
 [1] http://www.fontconfig.org/wiki/

I don't see speed a a problem as long as we parse every font only once.
Parsing the OpenType font Old Standard Regular and converting it into
a CustomFont is currently about 100 ms. 


Best Regards
Alex

-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Unit Tests in Intellij Idea

2009-09-25 Thread Alexander Kiel
Hi,

I search for a possibility to run the unit tests just inside idea.

If I let idea compile all the classes, coping the fo files and letting
ant generate the source files, it runs 2110 test, but fails 1737 of
them.

Some of the errors:

No IF document handler for the requested format available:
application/pdf

Don't know how to handle application/pdf as an output format. Neither
an FOEventHandler, nor a Renderer could be found for this output format.

The file format is not supported. No ImagePreloader found for
test/resources/images/img-w-size.svg

Event model doesn't contain the definition for
org.apache.fop.fo.FOValidationEventProducer

Class org.apache.fop.intermediate.IFParserTestCase has no public
constructor TestCase(String name) or TestCase()

No IF document handler for the requested format available:
application/postscript

-

Looks like configuration issues to me. So what can I do to be able to
run the unit tests?


One other related problem: I don't get the hyphenation unit tests to run
with Ant. I have the fop-hyph-1.2.jar in the classpath. I use Java 1.4,
JUnit 3.8 and Ant 1.7.

If I run junit-layout-hypenation, I get 5 errors out of 8 tests. I have
attached the ant output.


Best Regards
Alex


-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net

Buildfile: build.xml

init-avail:
 [echo] --- Apache FOP svn-trunk [1999-2009] 
 [echo] See build.properties and build-local.properties for additional build settings
 [echo] Apache Ant version 1.7.1 compiled on November 10 2008
 [echo] VM: 1.4.2_19-b04, Sun Microsystems Inc.
 [echo] JAVA_HOME: /home/akiel/j2sdk1.4.2_19/
 [echo] JAI Support PRESENT
 [echo] JCE Support PRESENT
 [echo] JUnit Support PRESENT
 [echo] XMLUnit Support PRESENT

init:

codegen:
 [echo] Generating the java files from xml resources

compile-java:

resourcegen:

compile-copy-resources:

compile:

retro-avail:

retro-unavail:
 [echo] Please set the path to a JDK 1.4 installation in your build-local.properties
 [echo] to allow for verification!

retro:

uptodate-jar-main:

jar-main:

compile-hyphenation:
 [echo] Hyphenation successful

uptodate-jar-hyphenation:

jar-hyphenation:

uptodate-jar-sandbox:

jar-sandbox:

package:

uptodate-transcoder-pkg:

transcoder-pkg:

junit-with-xmlunit:

junit-without-xmlunit:

junit-compile-java:

junit-compile-copy-resources:

junit-compile:

hyphenation-present:
 [echo] Hyphenation Support PRESENT

junit-layout-hyphenation:
 [echo] Running hyphenation layout engine tests
[junit] Testsuite: org.apache.fop.layoutengine.LayoutEngineTestSuite
[junit] Tests run: 8, Failures: 0, Errors: 5, Time elapsed: 1.759 sec
[junit] 
[junit] - Standard Output ---
[junit] Test: block_uax14_linebreaking_hyph.xml
[junit]   [WARN ] Line 4 of a paragraph overflows the available area by more than 50 points. (No context info available)
[junit]   [WARN ] Line 2 of a paragraph overflows the available area by 34448 millipoints. (No context info available)
[junit]   [WARN ] Line 1 of a paragraph overflows the available area by 46688 millipoints. (No context info available)
[junit] Test: inline_border_padding_hyphenate_de.xml
[junit]   [WARN ] Line 3 of a paragraph overflows the available area by 31390 millipoints. (No context info available)
[junit]   [WARN ] Line 3 of a paragraph overflows the available area by 31390 millipoints. (No context info available)
[junit]   [WARN ] Line 3 of a paragraph overflows the available area by 31390 millipoints. (No context info available)
[junit]   [WARN ] Line 3 of a paragraph overflows the available area by 31390 millipoints. (No context info available)
[junit]   [WARN ] Line 3 of a paragraph overflows the available area by 37390 millipoints. (No context info available)
[junit]   [WARN ] Line 3 of a paragraph overflows the available area by 37390 millipoints. (No context info available)
[junit]   [WARN ] Line 3 of a paragraph overflows the available area by 37390 millipoints. (No context info available)
[junit]   [WARN ] Line 3 of a paragraph overflows the available area by 37390 millipoints. (No context info available)
[junit] -  ---
[junit] - Standard Error -
[junit] Sep 25, 2009 5:39:04 PM org.apache.fop.hyphenation.Hyphenator getHyphenationTree
[junit] SEVERE: Couldn't find hyphenation pattern en
[junit] Sep 25, 2009 5:39:05 PM org.apache.fop.layoutengine.LayoutEngineTestSuite$1 runTest
[junit] SEVERE: Error on inline_border_padding_hyphenate.xml
[junit] Sep 25, 2009 5:39:05 PM org.apache.fop.hyphenation.Hyphenator getHyphenationTree
[junit] SEVERE: Couldn't find hyphenation pattern en_US
[junit] Sep 25, 2009 5:39:05 PM 

Best Interface for reading OpenType Files

2009-09-24 Thread Alexander Kiel
Hi,

I currently thinking about the interface to use for reading OpenType
files.

There are two possibilities:

 - reading on top of an InputStream or
 - reading on top of a RandomAccessFile or FileChannel.

Currently the implementation in FOP uses the class FontFileReader which
expects an InputStream. But it immediately calls IOUtils.toByteArray(in)
and works on that byte array instead. So it needs to hold the file
completely in memory.

FontBox which is part of PDFBox uses some abstract class called
TTFDataStream with template methods which has two implementations, one
called RAFDataStream which operates on top of a RandomAccessFile and one
called MemoryTTFDataStream which operates on top of a byte array.

I started using pure InputStreams. That means I implemented the whole
OpenType file reading using a hierarchy of FilterInputStreams. At the
lowest level I have a DataInputStream which takes every Inputstream and
provides methods to read the basic data types of OpenType just like
java.io.DataInputStream does for java data types. On top of that, I have
streams that can read some small scale data structures, than streams
which can read whole tables and finally a stream which can read the
whole OpenType file.

To read an OpenType file, all you have to write is:

InputStream in = ...
OpenTypeFileInputStream otfIn = new OpenTypeFileInputStream(in);
OpenTypeFile otf = otfIn.readOpenTypeFile();

In my opinion this system works really good. You can take every
InputStream, the reading is decoupled from the OpenType classes itself
and you can test peaces of OpenType structure using only the individual
streams.

But! My approach has one flaw. I need to seek extensively while reading
an OpenType file. The whole file format consists of headers with offsets
and data structures which one has to read from that offsets.

To get this seeking work with streams, I use mark(), reset() and skip().
My common approach at the beginning of such a structure is to mark, than
read the header and for every part, reset to the start, mark again, skip
to the offset and read the part.

But with this approach I'm ending up to hold the whole file in memory.

To make it worse, this mark(), reset(), skip() interface doesn't support
hierarchical marking. If I seek inside smaller scale structures the mark
position of the larger scale structure is overwritten. I don't think
that it is possible to build hierarchical mark support on top of any
markable InputStream. (Oh look I did it later as I wrote this longish
mail.) I think, one have to reimplement BufferedInputStream holding ones
own byte array. In fact I did this on top of ByteArrayInputStream. The
key problem is that one can't get a position out of an InputStream which
does not surprise as the concept of streams doesn't have a position. 

It is possible to read the parts in offset order. But there are
duplicated offsets (more than one offset pointing to the same part) and
parts that have to go into an array in a semantic order which doesn't
have to be the offset order. So I have to first reorder the offsets to
read the parts in offset order and than I have to reorder the read parts
again to get them back into the semantic order. That said - it is still
possible that the offsets are in fact in the semantic order of the
parts, but the spec doesn't say this.

I don't want to depend on RandomAccessFile or FileChannel, because I
need to be able to test reading of substructures out of byte arrays.
What I need is an Interface from which I can read bytes and which allows
multiple relative seeks. With multiple relative seeks I mean something
like multiple marks. As I wrote this, I implemented such a thing inside
my DataInputStream. There is now a method:

public SkipHandle mark();

and the SkipHandle class looks like this:

public class SkipHandle {

private final long relativePos;

public void skipTo(long offset);
}

SkipHandle is a non-static inner class of DataInputStream.
DataInputStream counts the bytes read and skipped to get an idea of its
actual position. The SkipHandle gets the actual stream position on
creation so that it is able to skip on DataInputStream relative to its
creation position. If the skip would be negative, SkipHandle resets the
whole stream to the start (on creation of DataInputStream, a normal mark
is set) and skips afterwards.

It works, but I find it a little but ugly. First I have to set a
mark(Integer.MAX_VALUE) on DataInputStream creation, because I want
always be able to reset the whole stream, but I don't have any
information about how many bytes are on the road. Than I have to disable
markSupport on my DataInputStream so that nobody kills my own mark.

But the biggest problem is that DataInputStream has now a non-standard
mark(), skipTo() API. Its not like a normal FilterInputStream anymore.
You can't use normal marking, because it's disabled and you have to
learn this new API instead. 

Streams simply aren't the 

Re: Best Interface for reading OpenType Files

2009-09-24 Thread Alexander Kiel
Hi Jeremias,

On Thu, 2009-09-24 at 21:06 +0200, Jeremias Maerki wrote:
 Right, and that accounts for a pretty large portion of FOP's memory
 consumption problem nowadays. With the use of OpenType fonts, this gets
 worse as they can be quite big. I'm glad you noticed that.

Yes, but currently I read all OpenType tables I'm aware of. The Java
data structures are quite bigger than the original file. The biggest
fonts I saw have a size of 400 kb. I don't know the Java structure size
at the moment. If its really a problem, I can profile this later. So
maybe I should add config options which select the tables to read.
Currently the data is moved into a CustomFont and the TTFFile is thrown
away (I hope so). But CustomFont doesn't have the power of advanced
OpenType features. So I think we will end up with some interfaces
instead which may be implemented by the TTFFile itself or by classes
using the TTFFile in the background. That said, we will end up with some
amount of data in memory anyway. 

 May I suggest to use ImageIO's ImageInputStream? That already has an
 implementation that buffers the stream in a temporary file (if allowed)
 so you basically have random access. I've used that extensively in the
 image loading framework in XML Graphics Commons and it seem to be
 ideal for what you need to do. You even get the hierarchical mark/reset.

Thanks for that! I did not know of ImageIO's ImageInputStream before.
It's an interface, which is good. It is capable of all the stuff I need.
The only thing to complain about is, that it has more functionality as
needed and that its named a bit odd for fonts. Maybe I should specify an
interface which is a subset of ImageInputStream and provide a simple
wrapper to ImageInputStream so that I can use the implementations.

 I don't think NIO will help much here. I'd really suggest
 ImageInputStream which should have everything you need. You can probably
 even reuse some utility code I've written for the image loading
 framework:
 http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/loader/util/ImageUtil.java?view=markup
 http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/loader/util/ImageInputStreamAdapter.java?view=markup
 
 The following class has some code to get an ImageInputStream from a URI.
 If it's a file URL it tries to get an ImageInputStream with random
 access. In all other cases, the content is buffered by ImageIO's default
 buffering implementations (depending on the settings).
 http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/loader/impl/AbstractImageSessionContext.java?view=markup
 That could might even be extracted to be useful to you.
 
 See also: http://java.sun.com/j2se/1.4.2/docs/api/javax/imageio/ImageIO.html
 (methods setUseCache() and setCacheDirectory)

Thanks for that pointers. What would you think? Should I specify my own
SeekableInputStream which isn't able to do all that bit operations and
some of the DataInput operations I don't need, or should I use
ImageInputStream directly? Is there a simple implementation on top of
byte arrays for unit testing? Ok I could use ImageInputStreamImpl for
that...

If I think of it more deeply, it would not so clever for a font API to
depend on javax.imageio. There is the x in javax and the image in
imageio which I don't like.

It's a pity that there is no common byte-only random access input source
interface in Java, isn't it?

Best Regards
Alex

-- 
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Re: ambiguity of grammar for font shorthand?

2009-09-22 Thread Alexander Kiel
Hi,

 Also, in your message you said we could ignore a value for font of
 caption, icon, etc., as the standard tells us to do, but the standard
 discusses these values and their relation to system fonts.  Was this
 an oversight on your part or am I mis-reading the spec? [1]

 [1] http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#font

The spec says:

XSL modifications to the CSS definition:

In XSL the font property is a pure shorthand property. System font
characteristics, such as font-family, and font-size, may be obtained by
the use of the system-font function in the expression language.

If I read this correctly the system font shorthands namely: caption,
icon, menu, message-box, small-caption, status-bar are not allowed in
XSL.


Best Regards
Alex



signature.asc
Description: This is a digitally signed message part


Re: ambiguity of grammar for font shorthand?

2009-09-22 Thread Alexander Kiel
Hi,

 I think it is probably the case that in the context of the font short
 hand – the font properties cannot take the value of inherit, since
 this renders the grammar irreducibly ambiguous.  While such an
 exclusion is not mentioned in the spec,  it makes sense that inherit
 must be excluded for the reason I’ve just given.

Once, I've written a CSS Minifier (shrinks CSS files). There I also
didn't allow inherit for individual properties inside the font
shorthand. So I would give the user a good error message here. I don't
think that there are many documents out there, which actually use
inherit for individual properties inside the font shorthand.


Best Regards
Alex



signature.asc
Description: This is a digitally signed message part


Re: PDFFontDescriptor Ascent Descent and FontBBox

2009-09-22 Thread Alexander Kiel
Hi Max,

thanks for pointing me to fontbox. As I did not find a repository with a
trunk, I had a look into fontbox-0.8.0-incubating. They have quite clean
code to parse TrueType files. But they are also not able to read any
OpenType data. They are even not able to read kerning data as FOP can
already. 

So I think, I will continue my OpenType effort inside FOP for now, but
we should consider to merge the font parsing into fontbox.

Lets see what I can do in FOP right now. Currently I have all the
current features refactured and written some tests against real fonts.
I'm on the way to get kerning info out of the OpenType GPOS table. I
have already written 19 test classes and added/modified 129 production
classes.


Best Regards
Alex

-  
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net


On Tue, 2009-09-22 at 11:22 +0200, Max Berger wrote:
 Alexandar,
 
 on a completely different note:
 
 It may be interesting to also look into fontbox (part of pdfbox),
 which is now also an apache project, and therefore we could use source
 synergy.
 
 http://incubator.apache.org/pdfbox/
 
 For the issue you've mentioned: This may be due so some issues with
 CID fonts not properly defining ascent and descent, but I don't have a
 reference for it. We would probably need some fonts to test and check.
 
 Max
 
 2009/9/21 Alexander Kiel alexanderk...@gmx.net:
  Hi,
 
  as I'm currently on the way to redesign the TrueType/OpenType file
  reading, from time to time I stumble over some odd things.
 
  But instead of blindly refactoring, I think its better to ask you
  before.
 
  Fonts have among other things the properties ascent, descent and
  fontBBox which are defined in the PDF spec chapter 5.7 Font Descriptors.
 
  If I look into PDFFactory rev. 800217 line 1445 - 1483 method
  makeFontDescriptor, I see two constructor calls, one for CID fonts and
  one for the other fonts. For CID fonts the constructor of
  PDFCIDFontDescriptor is called and for the others the constructor of
  PDFFontDescriptor is called. PDFCIDFontDescriptor is a subclass of
  PDFFontDescriptor.
 
  The odd thing here is, that the call of PDFCIDFontDescriptor doesn't use
  the ascent nor the descent of the font descriptor. Instead it uses some
  quantities out of the fontBBox to feed the PDFFontDescriptor. This can
  be seen inside PDFCIDFontDescriptor rev. 679326.
 
  So as fontBBox, ascent and descent don't be the same thing, I would
  recommend to change this. But this would properly break some things,
  because we would get other values for ascent and descent in CID fonts.
 
 
  Best Regards
  Alex
 
  -
  e-mail: alexanderk...@gmx.net
  web:www.alexanderkiel.net
 
 
 
 


signature.asc
Description: This is a digitally signed message part


PDFFontDescriptor Ascent Descent and FontBBox

2009-09-21 Thread Alexander Kiel
Hi,

as I'm currently on the way to redesign the TrueType/OpenType file
reading, from time to time I stumble over some odd things.

But instead of blindly refactoring, I think its better to ask you
before.

Fonts have among other things the properties ascent, descent and
fontBBox which are defined in the PDF spec chapter 5.7 Font Descriptors.

If I look into PDFFactory rev. 800217 line 1445 - 1483 method
makeFontDescriptor, I see two constructor calls, one for CID fonts and
one for the other fonts. For CID fonts the constructor of
PDFCIDFontDescriptor is called and for the others the constructor of
PDFFontDescriptor is called. PDFCIDFontDescriptor is a subclass of
PDFFontDescriptor.

The odd thing here is, that the call of PDFCIDFontDescriptor doesn't use
the ascent nor the descent of the font descriptor. Instead it uses some
quantities out of the fontBBox to feed the PDFFontDescriptor. This can
be seen inside PDFCIDFontDescriptor rev. 679326.

So as fontBBox, ascent and descent don't be the same thing, I would
recommend to change this. But this would properly break some things,
because we would get other values for ascent and descent in CID fonts.


Best Regards
Alex

-  
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net



signature.asc
Description: This is a digitally signed message part


Re: Volunteering to work on FOP development

2009-09-16 Thread Alexander Kiel
Hi,

 One area in which FOP needs more work is the compliance with the FO
 spec, versions 1.0 and 1.1, see
 http://xmlgraphics.apache.org/fop/compliance.html.

For one example: Recently I needed text-align=outside and I would 
still be happy to have this. Such rather small feature enhancements 
would give a much better experience using FOP.

Best Regards
Alex

-  
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net


On Tue, 2009-09-15 at 21:48 +0200, Simon Pepping wrote:
 On Mon, Sep 14, 2009 at 03:32:25PM -0400, Jonathan Levinson wrote:
  Hi,
   
  My management has asked me to volunteer to help fix FOP bugs and add FOP
  enhancements.  I'm not yet familiar with FOP internals though I've read
  your design documents.
 
 That is good news indeed. Thanks to Intersystems for this
 contribution.
 
 Note that you need to sign an Individual Contributor License Agreement
 (ICLA), and most conveniently your company needs to sign a Corporate
 Contributor License Agreement (CCLA), see
 http://www.apache.org/licenses/#clas, before FOP or any other ASF
 project can accept your code contributions.
 
 One area in which FOP needs more work is the compliance with the FO
 spec, versions 1.0 and 1.1, see
 http://xmlgraphics.apache.org/fop/compliance.html.
 
 Another area where FOP needs more work is support for non-Western
 documents. I do not know where the problems are, but it probably does
 not work right now. Ideally, we would have contributors from regions
 with such problems.
 
 Regards, Simon
 


signature.asc
Description: This is a digitally signed message part


Re: State of OpenType Font Implementation

2009-09-15 Thread Alexander Kiel
Hi Jeremias,

ok I think the first step would be to add CFF support into
org.apache.fop.fonts.truetype.TTFFile or to split TTFFile into
TrueTypeFile and OpenTypeFile and add CFF support only to OpenTypeFile.

In the last hour I waded through TTFReader, TTFFile, TTFDirTabEntry and
the OpenType Spec [1]. What about refactoring this code mess as a whole?
I mean seriously, does all of the FOP code looks like this one?


Best Regards
Alex 


[1]: http://www.microsoft.com/typography/otspec/otff.htm

-  
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net


On Tue, 2009-09-15 at 09:48 +0200, Jeremias Maerki wrote:
 Hi Alex,
 
 second good news today. I guess we need to define what should be covered
 by Open Type. One important aspect is certainly CFF support (which
 I've just mentioned to Jonathan a few minutes ago). Another aspect is
 what Bertrand Delacrétaz started to look into: ligatures, character
 combination and such. CFF should be relatively easy to implement.
 Ligature support is going to be much harder as this will have effects
 into the layout engine.
 
 OpenType fonts that have TrueType glyphs and don't require the advanced
 typographical stuff are already supported today, but many OTF fonts have
 CFF glyph data. So that would be the first priority IMO.
 
 On 15.09.2009 09:23:39 Alexander Kiel wrote:
  Hi,
  
  I'm new to the fop-dev list but use FOP for some years already.
  
  Recently I tried to use OpenType fonts. As documented FOP doesn't
  support OpenType fonts yet.
  
  The last and only discussion I could found on fop-dev is from 2006 [1].
  Looking into the trunk, there is not really anything done with respect
  to OpenType. So what is the state of OpenType support in 2009?
  
  Sure I could possibly help implementing it.
  
  Best Regards
  Alex
  
  
  [1]:
  http://www.mail-archive.com/fop-dev@xmlgraphics.apache.org/msg04892.html
  
  -  
  e-mail: alexanderk...@gmx.net
  web:www.alexanderkiel.net
  
 
 
 
 
 Jeremias Maerki
 
 


signature.asc
Description: This is a digitally signed message part


Re: State of OpenType Font Implementation

2009-09-15 Thread Alexander Kiel
Hi Max,

I apologize for my not so kind words.

I'm on the way to refactor the TTFFile part and hopefully add CFF
support. I will follow your suggestions and issue my patch.

Best Regards
Alex

-  
e-mail: alexanderk...@gmx.net
web:www.alexanderkiel.net


On Tue, 2009-09-15 at 17:28 +0200, Max Berger wrote:
 Alex,
 
 please note that the FOP code has been developed by multiple volunteers
 over the last ten years. As such, it does not always follow one clear
 path of design.
 
 That said, refactoring the FOP code for easier reading / maintainability
 is definitely wanted! The proper steps would be:
 
 - ensure that there are junit tests for the existing functionality. If
 not, add them.
 
 - ensure all junit tests run on your machine
 
 - refactor away, keeping in mind fop's conventions:
 http://xmlgraphics.apache.org/fop/dev/conventions.html
 Please note that FOP is currently still on Java 1.4.
 
 - ensure all junit tests still pass
 
 - create a bug report with [patch] in the subject line and attach your
 patch.
 
 Max
 
 Alexander Kiel schrieb:
  Hi Jeremias,
  
  ok I think the first step would be to add CFF support into
  org.apache.fop.fonts.truetype.TTFFile or to split TTFFile into
  TrueTypeFile and OpenTypeFile and add CFF support only to OpenTypeFile.
  
  In the last hour I waded through TTFReader, TTFFile, TTFDirTabEntry and
  the OpenType Spec [1]. What about refactoring this code mess as a whole?
  I mean seriously, does all of the FOP code looks like this one?
  
  
  Best Regards
  Alex 
  
  
  [1]: http://www.microsoft.com/typography/otspec/otff.htm
  
  -  
  e-mail: alexanderk...@gmx.net
  web:www.alexanderkiel.net
  
  
  On Tue, 2009-09-15 at 09:48 +0200, Jeremias Maerki wrote:
  Hi Alex,
 
  second good news today. I guess we need to define what should be covered
  by Open Type. One important aspect is certainly CFF support (which
  I've just mentioned to Jonathan a few minutes ago). Another aspect is
  what Bertrand Delacrétaz started to look into: ligatures, character
  combination and such. CFF should be relatively easy to implement.
  Ligature support is going to be much harder as this will have effects
  into the layout engine.
 
  OpenType fonts that have TrueType glyphs and don't require the advanced
  typographical stuff are already supported today, but many OTF fonts have
  CFF glyph data. So that would be the first priority IMO.
 
  On 15.09.2009 09:23:39 Alexander Kiel wrote:
  Hi,
 
  I'm new to the fop-dev list but use FOP for some years already.
 
  Recently I tried to use OpenType fonts. As documented FOP doesn't
  support OpenType fonts yet.
 
  The last and only discussion I could found on fop-dev is from 2006 [1].
  Looking into the trunk, there is not really anything done with respect
  to OpenType. So what is the state of OpenType support in 2009?
 
  Sure I could possibly help implementing it.
 
  Best Regards
  Alex
 
 
  [1]:
  http://www.mail-archive.com/fop-dev@xmlgraphics.apache.org/msg04892.html
 
  -  
  e-mail: alexanderk...@gmx.net
  web:www.alexanderkiel.net
 
 
 
 
  Jeremias Maerki
 
 
 
 


signature.asc
Description: This is a digitally signed message part