Re: reading text out of ps/pdf
On Sun, 14 Jan 2001, Jan Goebel wrote: you can maybe scanner/OCR software like GOCR (open source) take a look at: http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html Sure. You can try it. But don't expect too much. When I last time (maybe a half year ago) tested all free OCR programs for Linux -- there were several of them -- they were all very poor. I think that OCR programs are one of the weakest points of Linux :( There is still Wine...
Re: reading text out of ps/pdf
Tuukka Toivonen wrote: On Sun, 14 Jan 2001, Jan Goebel wrote: you can maybe scanner/OCR software like GOCR (open source) take a look at: http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html Sure. You can try it. But don't expect too much. When I last time (maybe a half year ago) tested all free OCR programs for Linux -- there were several of them -- they were all very poor. I think that OCR programs are one of the weakest points of Linux :( i don't think so, because i use OCRShop from www.vividata.com. sure, it's not open source, but 99$ for a personal edition is okay. and it works wonderful, for me ;-) Herbert -- [EMAIL PROTECTED] http://perce.de/lyx/
Re: reading text out of ps/pdf
On Sun, 14 Jan 2001, Jan Goebel wrote: you can maybe scanner/OCR software like GOCR (open source) take a look at: http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html Sure. You can try it. But don't expect too much. When I last time (maybe a half year ago) tested all free OCR programs for Linux -- there were several of them -- they were all very poor. I think that OCR programs are one of the weakest points of Linux :( There is still Wine...
Re: reading text out of ps/pdf
Tuukka Toivonen wrote: On Sun, 14 Jan 2001, Jan Goebel wrote: you can maybe scanner/OCR software like GOCR (open source) take a look at: http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html Sure. You can try it. But don't expect too much. When I last time (maybe a half year ago) tested all free OCR programs for Linux -- there were several of them -- they were all very poor. I think that OCR programs are one of the weakest points of Linux :( i don't think so, because i use OCRShop from www.vividata.com. sure, it's not open source, but 99$ for a personal edition is okay. and it works wonderful, for me ;-) Herbert -- [EMAIL PROTECTED] http://perce.de/lyx/
Re: reading text out of ps/pdf
On Sun, 14 Jan 2001, Jan Goebel wrote: > you can maybe scanner/OCR software like GOCR (open source) > take a look at: > http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html Sure. You can try it. But don't expect too much. When I last time (maybe a half year ago) tested all free OCR programs for Linux -- there were several of them -- they were all very poor. I think that OCR programs are one of the weakest points of Linux :( There is still Wine...
Re: reading text out of ps/pdf
Tuukka Toivonen wrote: > > On Sun, 14 Jan 2001, Jan Goebel wrote: > > > you can maybe scanner/OCR software like GOCR (open source) > > take a look at: > > http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html > > Sure. You can try it. But don't expect too much. When I last time (maybe a > half year ago) tested all free OCR programs for Linux -- there were > several of them -- they were all very poor. I think that OCR programs are > one of the weakest points of Linux :( i don't think so, because i use OCRShop from www.vividata.com. sure, it's not open source, but 99$ for a personal edition is okay. and it works wonderful, for me ;-) Herbert -- [EMAIL PROTECTED] http://perce.de/lyx/
Re: reading text out of ps/pdf
Hello, you can maybe scanner/OCR software like GOCR (open source) take a look at: http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html good luck jan PS: @christopher: if you were sucessfull, you may give me a reply? maybe i need it sometimes, too. On Sat, 13 Jan 2001, Matej Cepl wrote: Christopher Jones wrote: So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. I am afraid, that you have not much choice, than try any of these programs. Some of them are now much better, than they used to be. Try to find anybody with scanner - most of these programs should be able to scan documents from the external file. I am afraid, that there is nothing better to offer you. Matej -- +--- Jan Goebel (mailto:[EMAIL PROTECTED]) DIW Berlin Longitudinal Data and Microanalysis Knigin-Luise-Str. 5 D-14195 Berlin -- Germany -- phone: 49 30 89789-377 +---
Re: reading text out of ps/pdf
Christopher Jones wrote: So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. I am afraid, that you have not much choice, than try any of these programs. Some of them are now much better, than they used to be. Try to find anybody with scanner - most of these programs should be able to scan documents from the external file. I am afraid, that there is nothing better to offer you. Matej
Re: reading text out of ps/pdf
Hello, you can maybe scanner/OCR software like GOCR (open source) take a look at: http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html good luck jan PS: @christopher: if you were sucessfull, you may give me a reply? maybe i need it sometimes, too. On Sat, 13 Jan 2001, Matej Cepl wrote: Christopher Jones wrote: So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. I am afraid, that you have not much choice, than try any of these programs. Some of them are now much better, than they used to be. Try to find anybody with scanner - most of these programs should be able to scan documents from the external file. I am afraid, that there is nothing better to offer you. Matej -- +--- Jan Goebel (mailto:[EMAIL PROTECTED]) DIW Berlin Longitudinal Data and Microanalysis Knigin-Luise-Str. 5 D-14195 Berlin -- Germany -- phone: 49 30 89789-377 +---
Re: reading text out of ps/pdf
Christopher Jones wrote: So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. I am afraid, that you have not much choice, than try any of these programs. Some of them are now much better, than they used to be. Try to find anybody with scanner - most of these programs should be able to scan documents from the external file. I am afraid, that there is nothing better to offer you. Matej
Re: reading text out of ps/pdf
Hello, you can maybe scanner/OCR software like GOCR (open source) take a look at: http://altmark.nat.uni-magdeburg.de/~jschulen/ocr/index.html good luck jan PS: @christopher: if you were sucessfull, you may give me a reply? maybe i need it sometimes, too. On Sat, 13 Jan 2001, Matej Cepl wrote: > Christopher Jones wrote: > > So my question is: is there any software out there which attempts to look at > > bitmaps and guess what the ascii would be-- something like those programs which > > read books through a scanner and try to match font characters to the image. And > > I say this question is a reach, because I know that those programs which I have > > heard about are either very expensive or very innacurate. > > I am afraid, that you have not much choice, than try any of these > programs. Some of them are now much better, than they used to be. > Try to find anybody with scanner - most of these programs should > be able to scan documents from the external file. I am afraid, > that there is nothing better to offer you. > > Matej > -- +--- Jan Goebel (mailto:[EMAIL PROTECTED]) DIW Berlin Longitudinal Data and Microanalysis Königin-Luise-Str. 5 D-14195 Berlin -- Germany -- phone: 49 30 89789-377 +---
Re: reading text out of ps/pdf
Christopher Jones wrote: > So my question is: is there any software out there which attempts to look at > bitmaps and guess what the ascii would be-- something like those programs which > read books through a scanner and try to match font characters to the image. And > I say this question is a reach, because I know that those programs which I have > heard about are either very expensive or very innacurate. I am afraid, that you have not much choice, than try any of these programs. Some of them are now much better, than they used to be. Try to find anybody with scanner - most of these programs should be able to scan documents from the external file. I am afraid, that there is nothing better to offer you. Matej
Re: reading text out of ps/pdf
yes there is a tool called ps2ascii, it extracts plain texts form *.ps files []s lima-lopes R.E. de Lima-Lopes [EMAIL PROTECTED] GNU/Linux Registered User # 182240 On Sat, 13 Jan 2001, Christopher Jones wrote: Date: Sat, 13 Jan 2001 11:34:48 -0600 From: Christopher Jones [EMAIL PROTECTED] To: LyX [EMAIL PROTECTED] Subject: reading text out of ps/pdf This is a reach, I know. But in the hopes that there is something out there for me, I'll ask the question: is there anything which reads text out of a bitmaped pdf or ps file?
Re: reading text out of ps/pdf
I have that tool. But some pdf or ps files consist not of coded text but a bitmapped image. For instance, pdf and ps files which I download from journal databases are scanned images of journal pages. ps2ascii and pdftotext will not extract text from these files, since there is no ascii content to extract. Anyway, that is the best explanation I have been able to figure, by examining the contents of pdf and ps files and seeing that the post-preamble stuff is sometimes text, sometimes not, and seeing that ps2ascii poops out on the latter, though not on the former. So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. Thanks very much for the response. On Sun, 14 Jan 2001, you wrote: yes there is a tool called ps2ascii, it extracts plain texts form *.ps files
Re: reading text out of ps/pdf
Christopher Jones wrote: I have that tool. But some pdf or ps files consist not of coded text but a bitmapped image. For instance, pdf and ps files which I download from journal databases are scanned images of journal pages. ps2ascii and pdftotext will not extract text from these files, since there is no ascii content to extract. So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. with pdfimages -f 1 file.pdf DirForTheImages extract all images in the pdf-file. with option -j you can save them as jpegs, otherwise by default ppm or pbm - format (a good choice). With pdftotext file.pdf file.txt convert all to text. when the pdf-file has some scanned-text, which are saved as images you can convert these from pbm to tiff and than running an OCR program. Herbert -- [EMAIL PROTECTED] http://perce.de/lyx/
Re: reading text out of ps/pdf
Christopher Jones wrote: So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. I am afraid, that you have not much choice, than try any of these programs. Some of them are now much better, than they used to be. Try to find anybody with scanner - most of these programs should be able to scan documents from the external file. I am afraid, that there is nothing better to offer you. Matej
Re: reading text out of ps/pdf
yes there is a tool called ps2ascii, it extracts plain texts form *.ps files []s lima-lopes R.E. de Lima-Lopes [EMAIL PROTECTED] GNU/Linux Registered User # 182240 On Sat, 13 Jan 2001, Christopher Jones wrote: Date: Sat, 13 Jan 2001 11:34:48 -0600 From: Christopher Jones [EMAIL PROTECTED] To: LyX [EMAIL PROTECTED] Subject: reading text out of ps/pdf This is a reach, I know. But in the hopes that there is something out there for me, I'll ask the question: is there anything which reads text out of a bitmaped pdf or ps file?
Re: reading text out of ps/pdf
I have that tool. But some pdf or ps files consist not of coded text but a bitmapped image. For instance, pdf and ps files which I download from journal databases are scanned images of journal pages. ps2ascii and pdftotext will not extract text from these files, since there is no ascii content to extract. Anyway, that is the best explanation I have been able to figure, by examining the contents of pdf and ps files and seeing that the post-preamble stuff is sometimes text, sometimes not, and seeing that ps2ascii poops out on the latter, though not on the former. So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. Thanks very much for the response. On Sun, 14 Jan 2001, you wrote: yes there is a tool called ps2ascii, it extracts plain texts form *.ps files
Re: reading text out of ps/pdf
Christopher Jones wrote: I have that tool. But some pdf or ps files consist not of coded text but a bitmapped image. For instance, pdf and ps files which I download from journal databases are scanned images of journal pages. ps2ascii and pdftotext will not extract text from these files, since there is no ascii content to extract. So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. with pdfimages -f 1 file.pdf DirForTheImages extract all images in the pdf-file. with option -j you can save them as jpegs, otherwise by default ppm or pbm - format (a good choice). With pdftotext file.pdf file.txt convert all to text. when the pdf-file has some scanned-text, which are saved as images you can convert these from pbm to tiff and than running an OCR program. Herbert -- [EMAIL PROTECTED] http://perce.de/lyx/
Re: reading text out of ps/pdf
Christopher Jones wrote: So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. I am afraid, that you have not much choice, than try any of these programs. Some of them are now much better, than they used to be. Try to find anybody with scanner - most of these programs should be able to scan documents from the external file. I am afraid, that there is nothing better to offer you. Matej
Re: reading text out of ps/pdf
yes there is a tool called ps2ascii, it extracts plain texts form *.ps files []s lima-lopes R.E. de Lima-Lopes [EMAIL PROTECTED] GNU/Linux Registered User # 182240 On Sat, 13 Jan 2001, Christopher Jones wrote: > Date: Sat, 13 Jan 2001 11:34:48 -0600 > From: Christopher Jones <[EMAIL PROTECTED]> > To: LyX <[EMAIL PROTECTED]> > Subject: reading text out of ps/pdf > > This is a reach, I know. But in the hopes that there is something out there for > me, I'll ask the question: is there anything which reads text out of a bitmaped > pdf or ps file? >
Re: reading text out of ps/pdf
I have that tool. But some pdf or ps files consist not of coded text but a bitmapped image. For instance, pdf and ps files which I download from journal databases are scanned images of journal pages. ps2ascii and pdftotext will not extract text from these files, since there is no ascii content to extract. Anyway, that is the best explanation I have been able to figure, by examining the contents of pdf and ps files and seeing that the post-preamble stuff is sometimes text, sometimes not, and seeing that ps2ascii poops out on the latter, though not on the former. So my question is: is there any software out there which attempts to look at bitmaps and guess what the ascii would be-- something like those programs which read books through a scanner and try to match font characters to the image. And I say this question is a reach, because I know that those programs which I have heard about are either very expensive or very innacurate. Thanks very much for the response. On Sun, 14 Jan 2001, you wrote: > yes > > there is a tool called ps2ascii, it extracts plain texts form *.ps files >
Re: reading text out of ps/pdf
Christopher Jones wrote: > > I have that tool. But some pdf or ps files consist not of coded text but a > bitmapped image. For instance, pdf and ps files which I download from journal > databases are scanned images of journal pages. ps2ascii and pdftotext will not > extract text from these files, since there is no ascii content to extract. > > So my question is: is there any software out there which attempts to look at > bitmaps and guess what the ascii would be-- something like those programs which > read books through a scanner and try to match font characters to the image. And > I say this question is a reach, because I know that those programs which I have > heard about are either very expensive or very innacurate. with pdfimages -f 1 file.pdf DirForTheImages extract all images in the pdf-file. with option -j you can save them as jpegs, otherwise by default ppm or pbm - format (a good choice). With pdftotext file.pdf file.txt convert all to text. when the pdf-file has some scanned-text, which are saved as images you can convert these from pbm to tiff and than running an OCR program. Herbert -- [EMAIL PROTECTED] http://perce.de/lyx/
Re: reading text out of ps/pdf
Christopher Jones wrote: > So my question is: is there any software out there which attempts to look at > bitmaps and guess what the ascii would be-- something like those programs which > read books through a scanner and try to match font characters to the image. And > I say this question is a reach, because I know that those programs which I have > heard about are either very expensive or very innacurate. I am afraid, that you have not much choice, than try any of these programs. Some of them are now much better, than they used to be. Try to find anybody with scanner - most of these programs should be able to scan documents from the external file. I am afraid, that there is nothing better to offer you. Matej