subject:"script property"

Re: script property

2005-10-07 Thread Peter B. West


Manuel Mall wrote:

On Wed, 5 Oct 2005 04:17 pm, Jeremias Maerki wrote:


On 05.10.2005 09:46:18 Manuel Mall wrote:


While I am at it (this whole alignment stuff I mean) we may as well
do it properly. This would include support for the script
property. The allowed values for script are defined for example
here:
http://www.unicode.org/iso15924/iso15924-codes.html.

I assume we don't bother to validate if a correct code has been
provided as we don't do that for the country and language
properties either (should we? If we do we need more external config
files or expand fop.xconf to hold those values as they tend to
change over time).


We don't have to but we could. Since this is not something that
changes often I wouldn't put it into the config file, but in resource
files instead.



OK - makes sense.

Validation issues considered in alt-design circa 2002. See 
CountryLanguageScript.java in the alt-design code for an attempt at 
this.  Generated from xml-lang.xml and xml-lang.xsl.  No baselines.



Peter
--
Peter B. West http://cv.pbw.id.au/
Folio http://defoe.sourceforge.net/folio/


smime.p7s
Description: S/MIME Cryptographic Signature

Re: script property

2005-10-06 Thread J.Pietschmann


Manuel Mall wrote:

What we also need for proper script support is a mapping from Unicode
code point to script.


On a second thought: isn't this what Class Character.UnicodeBlock
does?

J.Pietschmann

Re: script property

2005-10-06 Thread Manuel Mall

On Fri, 7 Oct 2005 03:30 am, J.Pietschmann wrote:
 Manuel Mall wrote:
  What we also need for proper script support is a mapping from
  Unicode code point to script.

 On a second thought: isn't this what Class Character.UnicodeBlock
 does?

Joerg,

Thank you - I didn't even know that this class existed.

It doesn't quite solve all issues though I think:

a) We need a mapping from the ISO 4 letter codes to the 
Character.UnicodeBlock classes.

b) We need a mapping from the Character.UnicodeBlock to script 
properties (actually at this point in time the only property I am aware 
off is the default baseline for the script).

May be a wrapper around this class to provide that functionality?

 J.Pietschmann

Manuel

script property

2005-10-05 Thread Manuel Mall

While I am at it (this whole alignment stuff I mean) we may as well do
it properly. This would include support for the script property. The
allowed values for script are defined for example here:
http://www.unicode.org/iso15924/iso15924-codes.html.

I assume we don't bother to validate if a correct code has been
provided as we don't do that for the country and language
properties either (should we? If we do we need more external config
files or expand fop.xconf to hold those values as they tend to change
over time).

But what we do need is a mapping from scripts to default baselines for
these scripts. I haven't found a mapping list on the net. Any one come
across something like that? Otherwise we may have to make that up. That
means entries somewhere similar to: script code=Guru
baseline=hanging /. Is the fop config file the right place for this
stuff? Any not defined scripts encountered in an fo file would map to
baseline=alphabetic (may be with a warning to the user?).

What we also need for proper script support is a mapping from Unicode
code point to script. The mappings are for example defined here:
http://www.unicode.org/Public/UNIDATA/Scripts.txt.
How would one best process this (has this been done in FOP before?)?
Is there other Unicode stuff FOP needs which should be considered at the 
same time? 
Are we better off working with the raw Unicode data 
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)?

Manuel

Re: script property

2005-10-05 Thread Jeremias Maerki


On 05.10.2005 09:46:18 Manuel Mall wrote:
 While I am at it (this whole alignment stuff I mean) we may as well do
 it properly. This would include support for the script property. The
 allowed values for script are defined for example here:
 http://www.unicode.org/iso15924/iso15924-codes.html.
 
 I assume we don't bother to validate if a correct code has been
 provided as we don't do that for the country and language
 properties either (should we? If we do we need more external config
 files or expand fop.xconf to hold those values as they tend to change
 over time).

We don't have to but we could. Since this is not something that changes
often I wouldn't put it into the config file, but in resource files
instead.

 But what we do need is a mapping from scripts to default baselines for
 these scripts. I haven't found a mapping list on the net. Any one come
 across something like that?

Nope.

 Otherwise we may have to make that up. That
 means entries somewhere similar to: script code=Guru
 baseline=hanging /. Is the fop config file the right place for this
 stuff?

Again, I'd put it in separate resource files as this is not going to
change often and a rebuild of FOP is not the end of the world in this
case.

 Any not defined scripts encountered in an fo file would map to
 baseline=alphabetic (may be with a warning to the user?).

Sure.

 What we also need for proper script support is a mapping from Unicode
 code point to script. The mappings are for example defined here:
 http://www.unicode.org/Public/UNIDATA/Scripts.txt.
 How would one best process this? 

shrug/

 (has this been done in FOP before?)

I don't think so.

 Is there other Unicode stuff FOP needs which should be considered at the 
 same time? 
 Are we better off working with the raw Unicode data 
 (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)?

shrug/

We should simply make sure that this doesn't influence performance too
much for the big majority of users happy to use latin scripts. After all,
this looks like many lookups are necessary and all these maps have to be
loaded at one point.


Jeremias Maerki

Re: script property

2005-10-05 Thread J.Pietschmann


Jeremias Maerki wrote:

What we also need for proper script support is a mapping from Unicode
code point to script.

...

(has this been done in FOP before?)


I don't think so.


Have a look at
 http://people.apache.org/~pietsch/linebreak.tar.gz

Occasionally I've thought about some sort of Jakarta commons
Unicode file component, but the guys there weren't all that
enthusiastic about this, and I've not enough time to get
the ball rolling all of my own.

J.Pietschmann

Re: script property

2005-10-05 Thread Manuel Mall

On Thu, 6 Oct 2005 04:23 am, J.Pietschmann wrote:
 Jeremias Maerki wrote:
  What we also need for proper script support is a mapping from
  Unicode code point to script.

 ...

  (has this been done in FOP before?)
 
  I don't think so.

 Have a look at
   http://people.apache.org/~pietsch/linebreak.tar.gz

 Occasionally I've thought about some sort of Jakarta commons
 Unicode file component, but the guys there weren't all that
 enthusiastic about this, and I've not enough time to get
 the ball rolling all of my own.

Joerg,

thanks for that.

Do I understand this correctly that you use a Java code generation 
approach here. That is you generate Java source code from the Unicode 
text files which is then compiled as part of the line breaking code?

Not so sure I like that but then again if it works. For me this type of 
stuff feels more like pure data but of course we don't want to parse 
these text files each time FOP loads. What about the hyphenation 
pattern approach? Store it as a serialized object and treat it more 
like a resource? Accessing that should be comparable in time to class 
loading (I think as I haven't ever empirically tested that).

I haven't studied your code in detail but could we / should we integrate 
this into the FOP trunk to support 'Unicode compliant' line breaking?

My main goal still is to make FOP happen therefore I wouldn't like to 
dilute my effort / time in trying to argue / establishing another 
commons subproject at the moment. What about we create a 
org.apache.fop.unicode package for the time being where we keep unicode 
specific support stuff? That can then at a later stage be refactored 
into a commons subproject if the time/will/energy is there.

 J.Pietschmann

Manuel

Re: script property

2005-10-05 Thread Manuel Mall

On Wed, 5 Oct 2005 04:17 pm, Jeremias Maerki wrote:
 On 05.10.2005 09:46:18 Manuel Mall wrote:
  While I am at it (this whole alignment stuff I mean) we may as well
  do it properly. This would include support for the script
  property. The allowed values for script are defined for example
  here:
  http://www.unicode.org/iso15924/iso15924-codes.html.
 
  I assume we don't bother to validate if a correct code has been
  provided as we don't do that for the country and language
  properties either (should we? If we do we need more external config
  files or expand fop.xconf to hold those values as they tend to
  change over time).

 We don't have to but we could. Since this is not something that
 changes often I wouldn't put it into the config file, but in resource
 files instead.

OK - makes sense.

  But what we do need is a mapping from scripts to default baselines
  for these scripts. I haven't found a mapping list on the net. Any
  one come across something like that?

 Nope.

  Otherwise we may have to make that up. That
  means entries somewhere similar to: script code=Guru
  baseline=hanging /. Is the fop config file the right place for
  this stuff?

 Again, I'd put it in separate resource files as this is not going to
 change often and a rebuild of FOP is not the end of the world in this
 case.

My suggestion was based around the assumption that if we have to make up 
the mappings from script to baseline ourselves we may get it wrong. 
Therefore leave it up to the user to add the mappings for his/her 
language/script environment to the config file. Most users will deal 
only with a very few scripts so its not a big deal.


  Any not defined scripts encountered in an fo file would map to
  baseline=alphabetic (may be with a warning to the user?).

 Sure.

  What we also need for proper script support is a mapping from
  Unicode code point to script. The mappings are for example defined
  here: http://www.unicode.org/Public/UNIDATA/Scripts.txt.
  How would one best process this?

 shrug/

  (has this been done in FOP before?)

 I don't think so.

See Joerg's response.

  Is there other Unicode stuff FOP needs which should be considered
  at the same time?
  Are we better off working with the raw Unicode data
  (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)?

 shrug/
Seems like line breaking (and hyphenation, e.g. script specific 
hyphenation character) may also need Unicode stuff (not necessarily 
from the raw data file though).


 We should simply make sure that this doesn't influence performance
 too much for the big majority of users happy to use latin scripts.
 After all, this looks like many lookups are necessary and all these
 maps have to be loaded at one point.

Yes, that is a valid consideration. May be it needs to be designed in a 
way that these lookups can be disabled and replaced by defaults from 
the config file.

 Jeremias Maerki
Manuel

Re: script property

Re: script property

Re: script property

script property

Re: script property

Re: script property

Re: script property

Re: script property

8 matches

Site Navigation

Mail list logo

Footer information