Re: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture asolution?

2007-07-04 Thread Cory Snavely
Another way to get experience with the quality of Acrobat OCR is to use Acrobat 
Pro, which can do functionally the same thing, with a less batch-oriented 
interface. We ended up using this at a fairly large scale to meet a similar 
need.

We have documentation on preparing PDFs that we supply for submitters, and that 
you may find useful, at

http://deepblue.lib.umich.edu/html/2027.42/40244/PDF-Best_Practice.html

The section toward the bottom provides instructions on making image PDF files 
searchable.

Cory Snavely
University of Michigan Library IT Core Services
  - Original Message - 
  From: Jennifer Ash 
  To: dspace-tech@lists.sourceforge.net 
  Sent: Wednesday, July 04, 2007 6:55 AM
  Subject: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture 
asolution?


  Dear Community Members



  The Water Research Commission (WRC, South Africa) is currently assessing a 
pilot installation of DSpace.

  We want to use DSpace to store, search and retrieve all our WRC research 
reports and Water SA (a scientific publication, 4 issues pa) issues (this is 
the primary goal; other collections will most likely be added over time).

  We are faced with a problem in that most of our older publications are not in 
electronic format and will have to be scanned.

  Scanning and saving as PDF does not provide a full text searchable document 
in DSpace; I've tried it.



  A product, Adobe Capture, is advertised as a 'tool that teams with your 
scanner to convert volumes of paper documents into searchable Adobe Portable 
Document Format (PDF) files'.

  We are keen to investigate this product but there are no trial downloads 
offered by Adobe.

  Do you have any knowledge of this product? Can you advise on a suitable 
tehnology solution for our problem? Our backlog is vast and spans many years, 
so there are loads of documents that need to be scanned.



  I do hope someone can give me advice.



  Kind regards





  Jennifer Ash 
  ..
  Business Systems Manager
  Water Research Commission 
  Private Bag X03 
  GEZINA (Pretoria) 
  0031 
  Tel: (012) 330-9036 / 330-0340 
  Fax: (012) 330-9010 / 331-2565 
  E-mail: [EMAIL PROTECTED] 




  DISCLAIMER AND CONFIDENTIALITY NOTE: All factual and other information within 
this e-mail, including any attachments relating to the official business of the 
Water Research Commission (WRC), is the property of the WRC. It is 
confidential, legally privileged and protected against unauthorized use. The 
WRC neither owns nor endorses any other content. Views and opinions are those 
of the senders unless clearly stated as being that of the WRC. The addressee in 
the e-mail is the intended recipient. Please notify the sender immediately if 
it has unintentionally reached you and do not read, disclose or use the content 
in any way whatsoever. The WRC cannot assure that the integrity of this 
communication has been maintained nor that it is free of errors, viruses, 
interception or interferences. 

   






--


  -
  This SF.net email is sponsored by DB2 Express
  Download DB2 Express C - the FREE version of DB2 express and take
  control of your XML. No limits. Just data. Click to get it now.
  http://sourceforge.net/powerbar/db2/


--


  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Searching PDF-scanned documents: Adobe Capture asolution?

2007-07-04 Thread Graham Triggs
Hi,

The problem with your scanning attempts is that you are just capturing
an image of the page. To have searchable content, you need to perform
optical character recognition on the images.

According to:
http://www.adobe.com/uk/products/acrcapture/

Then yes, this will create PDFs that contain searchable words - although
with all OCR solutions, there is the question of accuracy, and for that
you would need the opinion of someone with experience of using the
product.

G

On Wed, 2007-07-04 at 12:55 +0200, Jennifer Ash wrote:
> Dear Community Members
> 
>  
> 
> The Water Research Commission (WRC, South Africa) is currently
> assessing a pilot installation of DSpace.
> 
> We want to use DSpace to store, search and retrieve all our WRC
> research reports and Water SA (a scientific publication, 4 issues pa)
> issues (this is the primary goal; other collections will most likely
> be added over time).
> 
> We are faced with a problem in that most of our older publications are
> not in electronic format and will have to be scanned.
> 
> Scanning and saving as PDF does not provide a full text searchable
> document in DSpace; I've tried it.
> 
>  
> 
> A product, Adobe Capture, is advertised as a 'tool that teams with
> your scanner to convert volumes of paper documents into searchable
> Adobe Portable Document Format (PDF) files'.
> 
> We are keen to investigate this product but there are no trial
> downloads offered by Adobe.
> 
> Do you have any knowledge of this product? Can you advise on a
> suitable tehnology solution for our problem? Our backlog is vast and
> spans many years, so there are loads of documents that need to be
> scanned.
> 
>  
> 
> I do hope someone can give me advice.
> 
>  
> 
> Kind regards
> 
>  
> 
>  
> 
> Jennifer Ash 
> ……
> Business Systems Manager
> Water Research Commission 
> Private Bag X03 
> GEZINA (Pretoria) 
> 0031 
> Tel: (012) 330-9036 / 330-0340 
> Fax: (012) 330-9010 / 331-2565 
> E-mail: [EMAIL PROTECTED] 
> 
>  
> 
> 
>  
> DISCLAIMER AND CONFIDENTIALITY NOTE: All factual and other information
> within this e-mail, including any attachments relating to the official
> business of the Water Research Commission (WRC), is the property of
> the WRC. It is confidential, legally privileged and protected against
> unauthorized use. The WRC neither owns nor endorses any other content.
> Views and opinions are those of the senders unless clearly stated as
> being that of the WRC. The addressee in the e-mail is the intended
> recipient. Please notify the sender immediately if it has
> unintentionally reached you and do not read, disclose or use the
> content in any way whatsoever. The WRC cannot assure that the
> integrity of this communication has been maintained nor that it is
> free of errors, viruses, interception or interferences.
> 
>  
> 
>  
> 
> 
> -
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> ___ DSpace-tech mailing list 
> DSpace-tech@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/dspace-tech 
 
 
This e-mail is confidential and should not be used by anyone who is not the 
original intended recipient. BioMed Central Limited does not accept liability 
for any statements made which are clearly the sender's own and not expressly 
made on behalf of BioMed Central Limited. No contracts may be concluded on 
behalf of BioMed Central Limited by means of e-mail communication. BioMed 
Central Limited Registered in England and Wales with registered number 3680030 
Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 4LB

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech