They may, you are right. If you wanted to maintain a list of known "restriction free" fonts and only extract those - that would probably be OK.
Leonard On 9/22/11 9:17 PM, "Josh Richardson" <[email protected]> wrote: >The fonts that are embedded in a PDF may come from any source, and be >completely restriction-free. It's really up to the user of the software >to decide. Note that there are many many many other open source programs >that extract fonts from PDFs. > >--josh > >On 9/22/11 6:04 PM, "Leonard Rosenthol" <[email protected]> wrote: > >>Boy, your lawyer needs to read up on IP law :). >> >>Since you do NOT have a license for the font data contained in the PDF, >>your software has NO RIGHTS to use that information for anything other >>than rendering the glyphs in the PDF. You certainly have NO rights to >>convert the format - in fact, doing so is a clear and distinct violation >>of the font licenses. >> >>As such, if your patches to pdf2html extract the font data for use in the >>HTML - I STRONGLY recommend that the code NOT be accepted into the master >>repository. >> >>Leonard >> >> >>On 9/22/11 6:40 PM, "Josh Richardson" <[email protected]> wrote: >> >>>I'm not a lawyer, but I did check with one. I don't think software can >>>violate your IP/licenses, at least as long as that software doesn't >>>contain unauthorized copyrighted material -- which pdftohtml does not >>>AFAIK -- I certainly didn't add any to it. >>> >>>Best, --josh >>> >>>On 9/22/11 3:08 PM, "Leonard Rosenthol" <[email protected]> wrote: >>> >>>>I can't recall what you said about this in the past, but since I was >>>>just >>>>dealing with it today. >>>> >>>>What do you do about embedded fonts? >>>> >>>>As my company (Adobe) sells/creates fonts, I want to make sure that >>>>pdftohtml won't be violating our IP/licenses. >>>> >>>>Thanks in advance, >>>>Leonard >>>> >>>>On 9/22/11 5:51 PM, "Josh Richardson" <[email protected]> wrote: >>>> >>>>>On 9/22/11 12:20 PM, "Jonathan Kew" <[email protected]> wrote: >>>>>>More generally, it is not possible to recreate useful XHTML (or >>>>>>similar) >>>>>>documents from arbitrary PDF files with anything like 100% >>>>>>reliability, >>>>>>because many PDF files do not contain adequate information to >>>>>>accurately >>>>>>map the rendered glyphs back to correct Unicode text, or to reliably >>>>>>reconstruct the proper flow of text. Constructs such as ActualText >>>>>>may >>>>>>help, but are often lacking from real-world PDF documents. >>>>> >>>>>W.r.t. rendering glyphs, we get around the problem of missing unicode >>>>>mappings by taking any glyph without a unicode mapping and assigning >>>>>it >>>>>an >>>>>offset in the private space of Unicode. This produces the correct >>>>>visual >>>>>result in the XHTML, but not a full semantic representation. If >>>>>someone's >>>>>interested, they could get the semantics right too by pattern-matching >>>>>the >>>>>glyph against an appropriate Unicode font. >>>>> >>>>>W.r.t. the flow of text, there have been other threads on this topic, >>>>>but >>>>>pdftohtml does make some attempt, and I believe it's possible to do >>>>>this >>>>>to a high degree of accuracy, maybe >99% -- that said, noone has done >>>>>it >>>>>yet, so either it's harder than I think, or no-one has cared enough to >>>>>really try (and I still fall into that camp.) >>>>> >>>>>Best, --josh >>>>> >>>>>_______________________________________________ >>>>>poppler mailing list >>>>>[email protected] >>>>>http://lists.freedesktop.org/mailman/listinfo/poppler >>>> >>>> >>> >>>_______________________________________________ >>>poppler mailing list >>>[email protected] >>>http://lists.freedesktop.org/mailman/listinfo/poppler >> >> > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
