Re: Text extraction from a certain PDF does not seem to terminate

2024-04-03 Thread Peter Murray-Rust
il: users-unsubscr...@pdfbox.apache.org > > For additional commands, e-mail: users-h...@pdfbox.apache.org > > > > > - > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apach

Re: installing and running PDFBox within Python

2022-10-02 Thread Peter Murray-Rust
; then linking into a global knowledge graph. P. On Sun, Oct 2, 2022 at 10:07 AM Tilman Hausherr wrote: > On 26.09.2022 10:53, Peter Murray-Rust wrote: > > * Does PDFBox3 have more functionality than PDFBox2 that would help? > > I don't think so, the main thing is the on demand p

installing and running PDFBox within Python

2022-09-26 Thread Peter Murray-Rust
opyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Yusuf Hamied Department of Chemistry University of Cambridge CB2 1EW, UK +44-1223-336432

Re: Table Extraction

2020-10-13 Thread Peter Murray-Rust
ame? > Any guidance would be much helpful. > > -- > Thanks & Regards > Kaushlendra Singh > Email: singh.kaushlendra...@gmail.com > Phone: +91 8377094564 > -- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

Re: Paragraph identification in apache pdf box

2020-08-12 Thread Peter Murray-Rust
t; >>> Aravind Swarna > > >>> > > >> > > >> - > > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > > >> For additional commands, e-mail: users-h...@pdfbox.apache.org > > >> > &g

Re: Detection of chess figure characters

2020-04-15 Thread Peter Murray-Rust
org > > -- "I always retain copyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

PDFBox and COVID-19

2020-03-26 Thread Peter Murray-Rust
is an encouragement to all of us. P. -- "I always retain copyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry

Re: Extracting graphics primitives by subclassing PageDrawer

2020-01-01 Thread Peter Murray-Rust
s retain copyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Extracting graphics primitives by subclassing PageDrawer

2019-12-31 Thread Peter Murray-Rust
in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Extracting graphics primitives by subclassing PageDrawer

2019-12-30 Thread Peter Murray-Rust
copyright in my papers, and nothing in any contract I sign with any publisher will override that fact. You should do the same". Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: AW: Finding a Box containing text

2019-09-20 Thread Peter Murray-Rust
I do a lot of this and there is no generic way. The rect might be a rect or 4 lines or a polyline 3 or 4 (or 5 for overlaps). It migh be drawn twice for emplhasis . I have have some heuristics for creating probable rects. in http://github.com/petermr/ami3 If you are serious and doing a *lot* I

Re: No Unicode mapping for xx (xx) in font null

2019-04-04 Thread Peter Murray-Rust
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > > -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: No Unicode mapping for xx (xx) in font null

2019-04-01 Thread Peter Murray-Rust
DF should insist that a Unicode font is used. Better still avoid PDF. -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Extract bold text from a PDF file

2019-03-18 Thread Peter Murray-Rust
text.getFont().getFontDescriptor().getFontWeight() ); // returns 0.0. > > > > > 4. System.out.println( getGraphicsState().getLineWidth() ); // > > > > > returns > > > > > 1.0. > > > > > 5. System.out.println( > > > > > getGraphicsState().getTextState().getRenderingMode() ); // returns > > > > > FILL > > > >

Re: Extract embedded SVG image from PDF file

2019-03-05 Thread Peter Murray-Rust
pean Neuroscience Center < mnachev.nscenter...@gmail.com> wrote: > Hi, > > What is the way to extract an embedded image, which is in SVG format from > an PDF file using PDFBox? > > If there is no such option, how to determine from where the embedded SVG > image starts a

Re: Fwd: Apache in 2018 - By The Digits

2019-01-09 Thread Peter Murray-Rust
ted Platinum: DLA Piper, Microsoft, Oath, OSU Open Source Labs, > >> and Sonatype. - Targeted Gold: Atlassian, The CrytpoFund, Datadog, > >> PhoenixNAP, and Quenda. - Targeted Silver: Amazon Web Services, > >> HotWax Systems, and Rackspace. - Targeted Bronze: Bint

Re: Extracting rotated text

2017-09-25 Thread Peter Murray-Rust
; k > > As > > se > > ss > > m > > en > > > > Thank you! > > > > > - > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > > -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: drawing arrow in content stream

2017-08-17 Thread Peter Murray-Rust
> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > > -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Tabular Data Extracting

2017-05-14 Thread Peter Murray-Rust
chnology/ > > You can use PDFBox if you know the positions in advance, then search in > the source code examples for ExtractTextByArea. > > Tilman > > > - > To unsubscribe, e-mail: users-unsubscr...@pdfbox.a

Re: What it feels like to be an open-source maintainer

2017-03-06 Thread Peter Murray-Rust
gt; For additional commands, e-mail: users-h...@pdfbox.apache.org > > -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

What is a "bead" and how is it created/used?

2016-12-29 Thread Peter Murray-Rust
software? What is the abstract model of a bead in a reader? * are there other ways of transmitting "chunks" other than beads? TIA P. -- Peter Murray-Rust Reader Emeritus in Molecular Informatics Unilever Centre, Dept. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Identify not visible characters - Overlapped characters

2016-12-29 Thread Peter Murray-Rust
page, for that reason Tabula is far from being a good tool for text > > extraction with correct positioning. > > We always welcome bug reports (and patches!) :) [1] > > Thanks! > > [1] https://github.com/tabulapdf/tabula-java/issues > > > — > Manuel Aristarán

Re: Getting the PDF box to work

2016-03-07 Thread Peter Murray-Rust
@gmail.com>> wrote: > > > > hello > > Can someone point me to a step by step guide to using this please? > > I have made it available under a project in Eclipse - but can't see any > > code. > > Regards > > Gopi > > > > > -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Strip Data out of PDF and save only skeleton.

2015-10-31 Thread Peter Murray-Rust
ut all the Data in the PDF and just save the skeleton alone ? >>>> Please share any custom solutions or ideas if any !! >>>> >>>> Thanks >>>> >>> >>> - >>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >>> For additional commands, e-mail: users-h...@pdfbox.apache.org >>> >>> >>> > > - > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > > -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Can XML file convert into pdf using pdfbox?

2015-06-05 Thread Peter Murray-Rust
Mrunal -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Problem reading PDF: encrypted document and unknown compression method

2015-04-19 Thread Peter Murray-Rust
Thank you On Sun, Apr 19, 2015 at 12:03 PM, Tilman Hausherr thaush...@t-online.de wrote: Am 19.04.2015 um 12:29 schrieb Peter Murray-Rust: Did you decrypt the file? Did you either load the file with loadNonSeq(), or with load() and then call openProtection()? I thought I had used

Re: pdfbox warnings

2015-04-06 Thread Peter Murray-Rust
-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Interpreting vector and pixel glyphs for characters

2015-03-24 Thread Peter Murray-Rust
early days, but it people are interested in collaborating or have better solutions we'd be interested (we aren't able to help with casual problems). P. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Re: Looking for some guidance on using PDFBox to analyze page content

2015-03-23 Thread Peter Murray-Rust
DATE: 2015-03-20 10:08 FROM: Peter Murray-Rust pm...@cam.ac.uk TO: users@pdfbox.apache.org users@pdfbox.apache.org REPLY-TO: We do a great deal of this and have created two

Re: Looking for some guidance on using PDFBox to analyze page content

2015-03-20 Thread Peter Murray-Rust
Canada Inc. 1755 Woodward Drive, Suite 101, Ottawa, Ontario K2C 0P9 APXConsult.com [1] Links: -- [1] http://apxconsult.com -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: PDF extraction

2015-02-02 Thread Peter Murray-Rust
, e-mail: users-unsubscr...@pdfbox.apache.org For additional commands, e-mail: users-h...@pdfbox.apache.org -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Non-unicode characters

2015-01-27 Thread Peter Murray-Rust
is finite if there is only one font. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Character widths in fonts

2014-11-20 Thread Peter Murray-Rust
...) (c) How does PDFTextStripper calculate spaces? From the Font, or by some other heuristics? (d) is there a debugging tool on PDFBox I could be using for this sort of problem? Many thanks. P. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University

Re: Character widths in fonts

2014-11-20 Thread Peter Murray-Rust
for figuring out thorny issues, but I know that’s not an option for everyone. Yes, I deliberately avoid it :-( -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Is sub-heading extraction possible?

2014-11-08 Thread Peter Murray-Rust
...@googlemail.com [mailto: peter.murray.r...@googlemail.com] On Behalf Of Peter Murray-Rust Sent: Thursday 6 November 2014 7:38 PM To: users@pdfbox.apache.org Subject: Re: Is sub-heading extraction possible? Greetings, In general there is NO automatic way - it depends on how the paper

Re: Is sub-heading extraction possible?

2014-11-06 Thread Peter Murray-Rust
-headings? More specifically, is it possible to extract only introduction Part or conclusion part? Thanks in advance. Best, Mehmet -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Extracting text into paragraphs

2014-10-31 Thread Peter Murray-Rust
: You may want to get in contact with Peter Murray-Rust( http://www.ch.cam.ac.uk/person/pm286) at the University of Cambridge. He seems to have been working on molecular informatics involving extraction of information from PDFs, and probably has faced many of your issues. —Ken Bowen

Re: Extracting text into paragraphs

2014-10-29 Thread Peter Murray-Rust
it may be worth investing in creating some of these heuristics. But it will still be work, unfortunately. We are gradually building up this sort of approach in http://bitbucket.org/petermr PDF2SVG and SVG2XML, based on PDFBOX but it's alpha at best P. -- Peter Murray-Rust Reader

Re: Regarding Table in PdfBox

2014-10-14 Thread Peter Murray-Rust
...@gmail.com wrote: Hi , How to identify table using PDFBOX . And extract text from it . Please help me with the idea . Thanks Borris -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: problem with pdf eof

2014-10-10 Thread Peter Murray-Rust
and document structuring... BR P -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: How to define regions in PDFTextStripperByArea?

2014-05-04 Thread Peter Murray-Rust
- multicolumns - BR Andreas Lehmkühler -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

OCR and PDFBox/PDF2SVG

2014-04-22 Thread Peter Murray-Rust
would save us going further) or to see if anyone is interested in using such a facility [Note that this is feasible mainly because the source is born-digital and binarized (0/1) and so does not suffer from scanning artefacts such as skewing, contrast, noise, etc.] P. -- Peter Murray-Rust Reader

Re: OCR and PDFBox/PDF2SVG

2014-04-22 Thread Peter Murray-Rust
to PDFBox as part of a GSoC engagement. Maybe that’s what you are looking for? BR Maruan Am 22.04.2014 um 15:39 schrieb Peter Murray-Rust pm...@cam.ac.uk: We have a need to carry out limited OCR in the PDF extraction process and are thinking of adding it to PDF2SVG ( https://bitbucket.org

Re: Eliminating super scripts while extracting text from pdf

2014-03-31 Thread Peter Murray-Rust
. As suggested, I have gone through the links provided, but unfortunately could not get to the heuristics to detect the subsuperscripts. If possible, please attach or provide a link that can publicly be accessed. Appreciate your help. On Sat, Mar 29, 2014 at 4:19 AM, Peter Murray-Rust pm

Re: Eliminating super scripts while extracting text from pdf

2014-03-29 Thread Peter Murray-Rust
been placed after the period(.) ex: Word: test with superscript super When it appeared at the end of a sentence, has been extracted as - test.super Is there any way I can get rid of superscripts? -- Br, Siva. -- Peter Murray-Rust Reader in Molecular Informatics Unilever

Re: 2 questions

2014-03-08 Thread Peter Murray-Rust
of words. I do this in two stages - translate all chars to SVG (PDF2SVG) and in a separate project (SVG2XML) do the character concatenation - I have to deal with subscripts, etc. Most PDF2Text tools don't deal with subscripts Thanks all ! Julien -- Peter Murray-Rust Reader in Molecular

Re: 2 questions

2014-03-07 Thread Peter Murray-Rust
-language science and European personal names. The worst case is probably slanted text in diagrams. As for the technical part (overloading the processText), it's ok, thanks for the advice. Best regards Julien -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep

Re: 2 questions

2014-03-06 Thread Peter Murray-Rust
to 1.7 ? Thanks and regards Julien -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Query regarding pdfbox and android compatibility

2014-02-06 Thread Peter Murray-Rust
by now? thanks and regards, -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Parsing a pdf file takes 3minutes

2013-12-23 Thread Peter Murray-Rust
improve on this? Any help appreciated Clemens -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: How to convert a text to curve?

2013-09-25 Thread Peter Murray-Rust
16360...@qq.com: Dears, I want to convert a text in a pdf to a curve. i.e. convert a text such as hello to a pen curve hello(change to token c) so that the text can not be copied. Thank you in advance! d.j. -- Peter Murray-Rust Reader in Molecular Informatics Unilever

Re: HTML to PDF

2013-09-24 Thread Peter Murray-Rust
either so can't comment further. On 24 September 2013 16:02, Henry, Chad chad.he...@laspbs.state.fl.us wrote: Is it possible to convert an HTML document to a PDF using pdfbox? Thanks Chad Henry System Design and Devlopment 717-9450 -- Peter Murray-Rust Reader in Molecular

Error somewhere in jbig2/PDPixelMap

2013-05-23 Thread Peter Murray-Rust
) at org.xmlcml.pdf2svg.PDFPage2SVGConverter.convertPageToSVG(PDFPage2SVGConverter.java:176) -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Determining whether character/font is bold

2013-05-13 Thread Peter Murray-Rust
and report. However these fonts are so non-conformant I am only semi-optimistic. Assuming it doesn't work, is there still a way of determining bold? P. BR Maruan Am 13.05.2013 um 10:55 schrieb Peter Murray-Rust pm...@cam.ac.uk: I am dealing with a number of (highly non-standard) fonts and wish

Re: Determining whether character/font is bold

2013-05-13 Thread Peter Murray-Rust
On Mon, May 13, 2013 at 8:52 PM, Maruan Sahyoun sahy...@fileaffairs.dewrote: Hi Peter, Am 13.05.2013 um 19:44 schrieb Peter Murray-Rust pm...@cam.ac.uk: Thanks for answering - any help is valuable. On Mon, May 13, 2013 at 6:02 PM, Maruan Sahyoun sahy...@fileaffairs.de wrote: I

Re: Subscript/Superscripts

2013-05-10 Thread Peter Murray-Rust
of PDFBox and the details of fonts so feel free to start at the beginning and drive slowly. :) Thanks, Buzzy -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: know the approach to develop program

2013-04-19 Thread Peter Murray-Rust
are interested in PDFBox as a reader then our http://bitbucket.org/petermr/pdf2svg-dev may be a useful example. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Fwd: Junk Characters while Extracting text from pdf file.

2013-02-05 Thread Peter Murray-Rust
But as Andreas says, ultimately these are probably non-conformant. A mixure of heuristics and glyph analysis (OCR and or heuristics) are required. Again PDF2SVG is addressing these - any community involvement is valued. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep

Re: retrieving graphical coordinates

2013-01-30 Thread Peter Murray-Rust
://bitbucket.org/petermr/pdf2svg and other sibling projects. We aim to extract high-level graphical objects from these paths. It's all F/OSS - you are welcome to use what we have so far. I am not aware of others doing the same at least in Open projects. P. Florian -- Peter Murray-Rust Reader

Re: Text for Ebook Readers

2013-01-27 Thread Peter Murray-Rust
libraries didn't do it. But that's out of scope here... -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: How to ensure a PDF is valid

2013-01-22 Thread Peter Murray-Rust
feel free to fork and develop it. FWIW the next phase (SVGPlus) uses heuristics recreate paragraphs and other objects (super/subscripts, maths equations, tables, semantic graphs). The third phase turns these into semantic chemistry, biology, etc. - all from the PDF. P. -- Peter Murray-Rust Reader

Re: Font properties

2012-12-28 Thread Peter Murray-Rust
. Does PDFBOX can get the font properties? There is another way to do it?? Thanks in advance Fernando Almeida I'll report on our own efforts later today. All our material is Open Source. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University

Re: ANN: AMI2-PDF2SVG conversion of PDF to semantic characters and graphics

2012-11-17 Thread Peter Murray-Rust
Neo4J -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

ANN: AMI2-PDF2SVG conversion of PDF to semantic characters and graphics

2012-11-16 Thread Peter Murray-Rust
through external files. Enjoy -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: extracting text from image using pdfbox

2012-10-15 Thread Peter Murray-Rust
a chance. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: extracting text from image using pdfbox

2012-10-12 Thread Peter Murray-Rust
. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Extracting vector graphics from PDF

2012-05-08 Thread Peter Murray-Rust
? That's a different and political issue. If anyone is interested in helping liberate the scientific literature legally then hacking PDFs is a major strategy. Volunteers welcome!] ** P. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge

Re: Extracting vector graphics from PDF

2012-05-07 Thread Peter Murray-Rust
with the glyphs as there are some documents where, I think, only glyph information is provided so I have to do some analysis there. Peter -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Extracting vector graphics from PDF

2012-05-07 Thread Peter Murray-Rust
is used in drawString so that might explain why there is no font information Is it worth changing to 1.7.0?? -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Extracting vector graphics from PDF

2012-05-07 Thread Peter Murray-Rust
and note that the final SVG contains ONLY paths and no text. many thanks.. ** -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069

Re: Extracting vector graphics from PDF

2012-05-07 Thread Peter Murray-Rust
*Peter Murray-Rust *Gesendet:* Montag, 7. Mai 2012 15:24 *An:* Andrey Kuznetsov *Cc:* users@pdfbox.apache.org *Betreff:* Re: Extracting vector graphics from PDF ** ** ** ** On Mon, May 7, 2012 at 1:31 PM, Andrey Kuznetsov imag...@gmx.de wrote:** ** Peter, The COS output

Re: Extracting vector graphics from PDF

2012-04-26 Thread Peter Murray-Rust
On Mon, Apr 2, 2012 at 2:58 PM, Peter Murray-Rust pm...@cam.ac.uk wrote: On Mon, Apr 2, 2012 at 2:51 PM, Andrey Kuznetsov imag...@gmx.de wrote: Peter, you have to pass your own Graphics2D object (with some overridden methods) to pdfbox. I am making good progress in capturing graphics

Extracting vector graphics from PDF

2012-04-02 Thread Peter Murray-Rust
are there others in the PDF-hacking community who also want to extract graphics? My own interest is scientific and technical graphs, tables, diagrams, etc. -- Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069