t; circle/arc?
> > Chances for a one-liner or something similary simple?
> >
> > I use PDFBox 3.0.3.
> >
> > Thank you for your hints!
> > Reg
> >
> >
> > -----
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> >
> >
>
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
out if you see a certain number of
> pages with no text.)
>
> Brian
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>
--
Peter Murray-Rust
Founder
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@pdfbox.apache.org
> >
>
>
> -----
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-
form; then linking into a global knowledge graph.
P.
On Sun, Oct 2, 2022 at 10:07 AM Tilman Hausherr
wrote:
> On 26.09.2022 10:53, Peter Murray-Rust wrote:
> > * Does PDFBox3 have more functionality than PDFBox2 that would help?
>
> I don't think so, the main thing is the o
s!
P.
--
"I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Yusuf Hamied Department of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-336432
gt; same?
> Any guidance would be much helpful.
>
> --
> Thanks & Regards
> Kaushlendra Singh
> Email: singh.kaushlendra...@gmail.com
> Phone: +91 8377094564
>
--
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
t;>> Thanks & regards,
> > >>> Aravind Swarna
> > >>>
> > >>
> > >> -
> > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > >> For additional commands, e-mail: users-h...@pdfb
..@pdfbox.apache.org
>
>
--
"I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
is an encouragement to all of us.
P.
--
"I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistr
ot;I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
ys retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
I do a lot of this and there is no generic way. The rect might be a rect or
4 lines or a polyline 3 or 4 (or 5 for overlaps). It migh be drawn twice
for emplhasis .
I have have some heuristics for creating probable rects.
in http://github.com/petermr/ami3
If you are serious and doing a *lot* I c
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
in PDF should insist that a
Unicode font is used. Better still avoid PDF.
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
> > > text.getFont().getFontDescriptor().getFontWeight() ); // returns 0.0.
> >
> > > 4. System.out.println( getGraphicsState().getLineWidth() ); //
> >
> > > returns
> >
> > > 1.0.
> >
> > > 5. System.out.println(
> >
> > > getGraphicsState().getTextState().getRenderingMode() ); // returns
> >
> > > FILL
>
t 10:34 PM European Neuroscience Center <
mnachev.nscenter...@gmail.com> wrote:
> Hi,
>
> What is the way to extract an embedded image, which is in SVG format from
> an PDF file using PDFBox?
>
> If there is no such option, how to determine from where the embedded SVG
>
fic activities or programs. -
> >> Targeted Platinum: DLA Piper, Microsoft, Oath, OSU Open Source Labs,
> >> and Sonatype. - Targeted Gold: Atlassian, The CrytpoFund, Datadog,
> >> PhoenixNAP, and Quenda. - Targeted Silver: Amazon Web Services,
> >> Hot
> > As
> > se
> > ss
> > m
> > en
> >
> > Thank you!
> >
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
---
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
y/
>
> You can use PDFBox if you know the positions in advance, then search in
> the source code examples for ExtractTextByArea.
>
> Tilman
>
>
> ---------
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apach
commands, e-mail: users-h...@pdfbox.apache.org
>
>
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
t; >
> >
> > Thanks!
> >
> > Regards,
> >
> > Eli
> >
> >
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: users-h...@
aware software? What is the
abstract model of a bead in a reader?
* are there other ways of transmitting "chunks" other than beads?
TIA
P.
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
oherent analysis when a table is larger than one
> > page, for that reason Tabula is far from being a good tool for text
> > extraction with correct positioning.
>
> We always welcome bug reports (and patches!) :) [1]
>
> Thanks!
>
> [1] https://github.com/tabulapdf/tabula-java/issues
>
>
> —
> Manuel Aristarán
> http://jazzido.com
>
>
>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>
--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
:
> >
> > hello
> > Can someone point me to a step by step guide to using this please?
> > I have made it available under a project in Eclipse - but can't see any
> > code.
> > Regards
> > Gopi
> >
> >
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
> Please share any custom solutions or ideas if any !!
>>>>
>>>> Thanks
>>>>
>>>
>>> -
>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
>>> For additional commands, e-mail: users-h...@pdfbox.apache.org
>>>
>>>
>>>
>
> -
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
> Thanks
> Mrunal
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Thank you
On Sun, Apr 19, 2015 at 12:03 PM, Tilman Hausherr
wrote:
Am 19.04.2015 um 12:29 schrieb Peter Murray-Rust:
>
> Did you decrypt the file? Did you either load the file with loadNonSeq(),
> or with load() and then call openProtection()?
>
I thought I had used loadNonSeq(
yption, or a broken PDF that Adobe can somehow
read or some other problem?
Many thanks
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
On Tue, Apr 7, 2015 at 7:49 AM, John Hewson wrote:
> >
> > On 6 Apr 2015, at 09:49, Peter Murray-Rust wrote:
> >
> ...
> >
> > PDFBox relies on the target OS distribution to include some of the 14
> > fonts. Since Windows doesn't have ZapfDingba
ve to differentiate between "licence to use" and "licence to
redistribute"
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
created.
> >
> > I will test that ExternalFonts.addSubstitute when I get time, as another
> > workaround.
>
> -----
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: users-h...@pdfbox.apache.org
>
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
occurrences.
It's early days, but it people are interested in collaborating or have
better solutions we'd be interested (we aren't able to help with casual
problems).
P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
JECT:
> Re: Looking for some guidance on using PDFBox to analyze
> page content
>
> DATE:
> 2015-03-20 10:08
>
> FROM:
> Peter Murray-Rust
>
> TO:
> &qu
e you can provide me some source codes extracting pdfs using PDFbox. Not
> just stripper.getText().
> Thanks a billion!!! I hope you write to me soon!!!
> sincerely,
>
> dock CHEN
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
at)
>
> Regards,
>
> WARREN GALLAGHER - CTO
>
> warren.gallag...@apxconsult.com
>
> M: 613-791-4987 W: 613-262-2601 Advance Property eXposure Canada Inc.
> 1755 Woodward Drive, Suite 101, Ottawa, Ontario K2C 0P9 APXConsult.com
> [1]
>
> Links:
> --
> [1]
3. $0.00$100
> >> Is there a way with PDFbox to extract a specific value(s) from the
> table?
> >> Example: Bank Of America and $0.00
> >> And also is there a way to cut the whole table and paste it into a
> >> different PDF?
> >> Please let m
d - but the task is finite if
there is only one font.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Acrobat Pro for
> figuring out thorny issues, but I know that’s not an option for
> everyone.
>
>
Yes, I deliberately avoid it :-(
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Is it allowed to have the same name for 2 different fonts (it would be
very bad...)
(c) How does PDFTextStripper calculate spaces? From the Font, or by some
other heuristics?
(d) is there a debugging tool on PDFBox I could be using for this sort of
problem?
Many thanks.
P.
--
Peter Murray-R
tioned I
> can grab many templates such as IEEExplore, Spring etc.
>
>
That's a very good point. If we can identify the authoring template we may
be able to create the reverse engineering.
>
> --
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
gt; Thanks.
>
> Best regards,
> Mehmet
>
>
>
> -Original Message-
> From: peter.murray.r...@googlemail.com [mailto:
> peter.murray.r...@googlemail.com] On Behalf Of Peter Murray-Rust
> Sent: Thursday 6 November 2014 7:38 PM
> To: users@pdfbox.apache.org
> Sub
> it possible to detect
> Headings and sub-headings? More specifically, is it possible to extract
> only introduction
> Part or conclusion part?
>
> Thanks in advance.
>
> Best,
> Mehmet
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
a starting point for you in that
> it
> > looks for graphic boxes drawn around text to identify table headings.
> >
> > Frank
> >
> > On Thu, Oct 30, 2014 at 6:27 AM, Ken Bowen wrote:
> >
> > > You may want to get in contact with Peter Murray-Rust(
&g
, not
characters) will sometime be used.
If you are going to do a lot of this, with a single source of documents it
may be worth investing in creating some of these heuristics. But it will
still be work, unfortunately.
We are gradually building up this sort of approach in
http://bitbucket.org/pet
:
> Hi ,
> How to identify table using PDFBOX . And extract text from it .
> Please help me with the idea .
>
> Thanks
> Borris
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
the hard work done by
list members in writing PDFBox. Because the process is now legal in UK
there is more incentive to develop and publish downstream analytic tools
and that's what we are doing (Apache2-Open, of course).
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep
m glow of having helped the human race. Same goes for
tables and document structuring...
BR
P
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
text from some PDF files, the question I want to ask is
> that: is there a way to automatically evaluate the quality of text
> extraction result? Or can PDFBox offer a confidence score about the
> extracted text result?
>
> Regards,
>
--
Peter Murray-Rust
Reader in Molecular Info
rks fine for simple texts. It gets more complicated and
> may lead to a false result if one of the following is used:
>
> - different text sizes in the same line
> - different font sizes in the same line
> - super/subscripts
> - multicolumns
> -
>
> BR
> Andreas Lehmkühler
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
x as part of a GSoC engagement.
>
> Maybe that’s what you are looking for?
>
> BR
> Maruan
>
> Am 22.04.2014 um 15:39 schrieb Peter Murray-Rust :
>
> > We have a need to carry out limited OCR in the PDF extraction process and
> > are thinking of adding it to PDF2S
ther have a solution (which would save us
going further) or to see if anyone is interested in using such a facility
[Note that this is feasible mainly because the source is born-digital and
binarized (0/1) and so does not suffer from scanning artefacts such as
skewing, contrast, noise, etc.]
P.
-
> Thanks.
>
> As suggested, I have gone through the links provided, but unfortunately
> could not get to the heuristics to detect the subsuperscripts.
>
> If possible, please attach or provide a link that can publicly be accessed.
>
> Appreciate your help.
>
>
> On Sat, Ma
as normal text.
> >
> > A superscript to a word, which is the last word of a sentence, has been
> > placed after the period(.)
> >
> > ex: Word: "test" with superscript "super"
> > When it appeared at the end of a sentence, has been e
27;m making is a bit more advanced than the one embedded in
> PDFBox as it creates a list of
> couples (XY position of a word, contents of a word) and not just give the
> list of words.
>
I do this in two stages - translate all chars to SVG (PDF2SVG) and in a
separate project (SVG2XML)
gt;
> > BR
> > Maruan Sahyoun
> >
> > Am 07.03.2014 um 18:24 schrieb HQS :
> >
> >> Thank you all for those accurate answers.
> >> I will give a try to the geometrical approach based on the (x, y)
> coordinates of the characters.
> >>
> >
t; But this is not an issue, my problem is more the fact that this method may
> not be 100% reliable. What do you think ?
>
We are committed to solving it for English-language science and European
personal names. The worst case is probably slanted text in diagrams.
>
> As for the technical par
king for?
> >>>
> >>> BR
> >>> Maruan Sahyoun
> >>>
> >>> Am 06.03.2014 um 18:39 schrieb HQS :
> >>>
> >>>> Hello all,
> >>>>
> >>>> 1.
> >>>> Have you ever seen PD
r.
>> Thus, I wanted to know if this is still an issue or its solved by now?
>>
>> thanks and regards,
>>
>>
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
PDF (which uses PDFBox) -
http://tabula.nerdpower.org/ - is among the most advanced open source
projects. I do some of this myself in https://bitbucket.org/petermr/ami2.
We hope to pool our software and experiences so we don't all have to
reinvent algorithms and heuristics.
It's mindbogglingly tedious to do this.
>
> /Johnny
>
>
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
ue requires 3minutes.
>
> How/why is this possible? How can I improve on this?
>
> Any help appreciated
> Clemens
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
e.g. set a permission in PDF to
> disallow text extraction
> http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/encryption/AccessPermission.html
> >
> > BR
> >
> > Maruan Sahyoun
> >
> > Am 24.09.2013 um 17:10 schrieb daijun <16360...@qq.com>:
>
x27;ve not used FOP directly either so can't comment further.
>
>
> On 24 September 2013 16:02, Henry, Chad >wrote:
>
> > Is it possible to convert an HTML document to a PDF using pdfbox? Thanks
> >
> > Chad Henry
> > System Design and Devlopment
> > 71
ter.java:190)
at
org.xmlcml.pdf2svg.PDFPage2SVGConverter.convertPageToSVG(PDFPage2SVGConverter.java:176)
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
On Mon, May 13, 2013 at 8:52 PM, Maruan Sahyoun wrote:
> Hi Peter,
>
> Am 13.05.2013 um 19:44 schrieb Peter Murray-Rust :
>
> > Thanks for answering - any help is valuable.
> >
> > On Mon, May 13, 2013 at 6:02 PM, Maruan Sahyoun >wrote:
> >
> >
> &
. The only
practical answer seems to be crowdsourcing or to read the glyphs and find
out what is going on. Which is why it is useful to know where the
fontWeight is applied.
P
Cheers,
E.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
ho can give help.
P.
On Mon, May 13, 2013 at 11:55 AM, Peter Murray-Rust wrote:
>
>
>
> On Mon, May 13, 2013 at 11:14 AM, Maruan Sahyoun
> wrote:
>
>> Hi Peter,
>>
>> which version of PDFBox are you using? If you are still on 1.7 I'd
>> suggest
;ll upgrade and report. However these fonts are so
non-conformant I am only semi-optimistic. Assuming it doesn't work, is
there still a way of determining bold?
P.
> BR
> Maruan
>
> Am 13.05.2013 um 10:55 schrieb Peter Murray-Rust :
>
> > I am dealing with a number of (hig
often does not.
At present I simply compile a list of fonts, so any help welcomed.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
life easier.
>
> I am new to the world of PDFBox and the details of fonts so feel free to
> start at the beginning and drive slowly. :)
>
> Thanks,
> Buzzy
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
nput - it's a communal
OpenSource project. Not ready for general use , especially why people don't
understand there is an inevitable error rate (although small).
We'd be delighted to hear from anyone needing this but at present you need
to be able to understand running java
rning it. If you
are interested in PDFBox as a reader then our
http://bitbucket.org/petermr/pdf2svg-dev may be a useful example.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
and other way to go on this.
>
> Regards, Kulbhushan
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
everal TeX
fonts (CMM etc.) but haven't done a Ghostcript one and it would be useful
But as Andreas says, ultimately these are probably non-conformant. A mixure
of heuristics and glyph analysis (OCR and or heuristics) are required.
Again PDF2SVG is addressing these - any community involvement i
http://bitbucket.org/petermr/pdf2svg and other sibling
projects. We aim to extract high-level graphical objects from these paths.
It's all F/OSS - you are welcome to use what we have so far. I am not aware
of others doing the same at least in Open projects.
P.
> Florian
--
Peter Mu
at creating PDF from Word, LaTeX, etc. usually
*destroys* information. I wish university libraries didn't do it. But
that's out of scope here...
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
ry valuable - and please feel
free to fork and develop it.
FWIW the next phase (SVGPlus) uses heuristics recreate paragraphs and other
objects (super/subscripts, maths equations, tables, semantic graphs). The
third phase turns these into semantic chemistry, biology, etc. - all from
the PDF.
ou can rely on keywords occurring at
predictable places). If the corpus has varied sources and also covers a
range of years this often introduces a lot of variation.
> Does PDFBOX can get the font properties? There is another way to do it??
>
> Thanks in advance
>
> Fernando Almeida
&
graph" consisting only of a space. I was unaware that PDF
supported spaces - are these coming from the original document or are they
generated in PDFBox from calculations of character spacing and width?
TIA for help.
P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centr
Q5 & Mobile
> b. http://technoracle.blogspot.com
> t. @duanechaos
> "Don't fear the Graph! Embrace Neo4J"
>
>
>
>
>
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
am sure some of you will have faced the same
problems and any (even partial) solutions will be useful.
PDFD2SVG is beta; the others are being refactored to alpha.
PDF2SVG may, of course, be of use in other disciplines - character
processing is configurable through external files.
Enjoy
--
Peter M
t then you have a chance.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
change frequently.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
ordinate, but this
varies slightly because of the different glyph origins.
> **
>
> May be I’ll download new version and look deeper…
>
> **
>
I, for one, would be grateful if you did! I thought I was
miscompiling/omitting some resource, etc. which caused different o
;
> ** **
>
> Andrey
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *Von:* peter.murray.r...@googlemail.com [mailto:
> peter.murray.r...@googlemail.com] *Im Auftrag von *Peter Murray-Rust
> *Gesendet:* Montag, 7. Mai 2012 15:24
>
> *An:* Andrey Kuznetsov
&
ds
>
> **
>
I shall probably create a hack of some kind. I can find a san-serif and
serif which are "fairly close" and substitute them. How would I get a
system COSDictionary I could substitute?
I am mainly interested in:
* the identity of the characters
* the font metr
Meanwhile I have rerun my code with pdfbox-1.7.0-SNAPSHOT and note that the
final SVG contains ONLY paths and no text.
many thanks..
> **
>
>
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
l;
}
I notice now that this call is used in drawString so that might explain why
there is no font information
Is it worth changing to 1.7.0??
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
ception) {
exception.printStackTrace();
}
}
> **
>
> ** **
>
> --
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
I am quite prepared to work with the glyphs as
there are some documents where, I think, only glyph information is provided
so I have to do some analysis there.
Peter
> --
>
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
On Mon, Apr 2, 2012 at 2:58 PM, Peter Murray-Rust wrote:
>
>
> On Mon, Apr 2, 2012 at 2:51 PM, Andrey Kuznetsov wrote:
>
>> Peter, you have to pass your own Graphics2D object (with some overridden
>> methods) to pdfbox.
>>
>>
I am making good progress in ca
>
>
> -Ursprüngliche Nachricht-
> Von: peter.murray.r...@googlemail.com
> [mailto:peter.murray.r...@googlemail.com] Im Auftrag von Peter Murray-Rust
> Gesendet: Montag, 2. April 2012 15:27
> An: users@pdfbox.apache.org
> Betreff: Re: Extracting vector graphics
which I
was able to interpret, but it was disjoint from the stream. Is it possible
to examine the Graphics2D in the debugger.
When you say "Graphics2D" do you mean Java 2D or is there a PDFBox graphics
engine? If so what is it called :-)
P.
--
Peter Murray-Rust
Reader in Molecula
are there others in the PDF-hacking community who also want to extract
graphics? My own interest is scientific and technical graphs, tables,
diagrams, etc.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
95 matches
Mail list logo