Adding Font Parsers

Fernando Arreola Thu, 09 Jun 2011 17:50:22 -0700

Hello,

I was looking to add a couple font parsers (for .afm files and .pfb files
using FontBox). I am new to Tika, an file parsing in general, therefore I
have a couple questions.


I read through 5 minute quick start tutorial and started following the steps
detailed there. I noticed that the tika-mimetypes.xml file already has an
entry which contains the afm and pfb file types.

  <mime-type type="application/x-font-type1">
    <glob pattern="*.pfa"/>
    <glob pattern="*.pfb"/>
    <glob pattern="*.pfm"/>
    <glob pattern="*.afm"/>
  </mime-type>

Since I was planning on doing one parser for each I changed the file to read
more like below:

  <mime-type type="application/x-font-pfb">
    <acronym>PFB</acronym>
    <_comment>Printer Font Binary</_comment>
    <glob pattern="*.pfb"/>
  </mime-type>

  <mime-type type="application/x-font-afm">
    <acronym>AFM</acronym>
    <_comment>Adobe Font Metric</_comment>
    <glob pattern="*.afm"/>
  </mime-type>

  <mime-type type="application/x-font-type1">
    <glob pattern="*.pfa"/>
    <glob pattern="*.pfm"/>
  </mime-type>

Is this appropriate or is the original content preferred?

 Both seem to work, at least for the .pfb files, which brings me to my next
question. I have about 9 different .afm files which I downloaded from an
Adobe site. When I run tika on these files one is recognized appropriately
("x-font-type1" in the original version and "x-font-afm" in the updated
version), but the rest are only recognized as "text/plain". I haven't really
added a real parser, I basically copied the one from the tutorial and
changed the supported type to be the corresponding mime type. Am I doing
something wrong or I am missing a step for it to actually recognize the .afm
files (I did add my fake parser to the list for the AutoDetectParser to
include it)?

Any help would be greatly appreciated.

Thanks,
Fernando Arreola

Adding Font Parsers

Reply via email to