Re: editing pdf files

2012-10-13 Thread Gary Kline
On Sat, Oct 13, 2012 at 01:19:07PM +0200, Polytropon wrote:
 On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote:
 
 The disassembling can be done with 
 
   % pdfimages source.pdf .
 
 Then the files can be edited whatever tool you like, e. g. Gimp.
 They often come out in PBM format.
 


A qstn I should have asked last time.  this book is a history or
bio of richland county, ohio::  in type, it's like 650 or more
pages.  SO: Is pdfimages going to spit of 6t50 files?  as noted 
in last email, only  a couple of these images are of any interest 

-- 
 Gary Kline  kl...@thought.org  http://www.thought.org  Public Service Unix
  Twenty-six years of service to the Unix community.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: editing pdf files

2012-10-13 Thread Polytropon
On Sat, 13 Oct 2012 13:47:01 -0700, Gary Kline wrote:
 On Sat, Oct 13, 2012 at 01:19:07PM +0200, Polytropon wrote:
  On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote:
  
  The disassembling can be done with 
  
  % pdfimages source.pdf .
  
  Then the files can be edited whatever tool you like, e. g. Gimp.
  They often come out in PBM format.
  
 
 
   A qstn I should have asked last time.  this book is a history or
   bio of richland county, ohio::  in type, it's like 650 or more
   pages.  SO: Is pdfimages going to spit of 6t50 files?  as noted 
   in last email, only  a couple of these images are of any interest 

Depends on what actually _is_ in the PDF file. If every page is
represented as a picture, 650 pictures will be created. If it
contains text _and_ images, the images will be output, if will
_only_ output the images, with no real realtion to where they
have been placed in the text. As suggested by the name pdfimages
it takes the images from the PDF file. :-)

The easiest way to check for possible text is to install xpdf
which brings the binary pdftotext (if I remember correctly that
this tool is in _that_ package). You can then use it like this:

% pdftotext source.pdf

It will create source.txt with all actual text (but of course
without _any_ formatting except line breaks and ^L page breaks),
including page numbers. But hey, it's pure ASCII text suitable
for further processing. :-)

Run pdftotext without parameters for a short summary of its
parameters; man pdftotext is also provided.


-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: editing pdf files

2012-10-13 Thread Polytropon
On Sat, 13 Oct 2012 13:38:16 -0700, Gary Kline wrote:
 On Sat, Oct 13, 2012 at 01:19:07PM +0200, Polytropon wrote:
  On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote:
 ive got a question that fits in here.  hopefully.
   
 last week  I found a book from 1901 that google had scanned and listed
 as a pdf file.  it was text plus photos of the rich/famous of the 
 1800s.  somehow, google found the exact string that matched my great
 grandfather [from the civil war].  I d'loaded the file (maybe 2mbytes)
 and searched using acroread.  nada.  I used the pdftotext utility.
 same: nothing but  some 600 page numbers.
   
 my guess is that google just took photos of the book and used other
 tools to create a pdf file.  I am not =that= serious  about genealogy,
 but I would like to know if there are any tools to edit this kind of
 pdf file.
  
  In case the PDF is nothing more than a compilation of images,
  there's a way to deal with it for editing:
 
 
   the images in this book aren't what I am interested in.
   just text.

In case the text is in images (i. e. the images contain text),
postprocessing those images will be the only way to obtain the
text information (if there is no actual text in the PDF).



   what fmt works best with the ocr suites?  or are they about the 
   same?  for the section I got in that 1901 book on my g-grandfather,
   it was only about 1.5 pages.  there was no photo, just his name 
   and some bio.  Still, things I had no knowledge of.  I'm sure 
   that my father didnt know either!

It should work with any lossless (!) format, especially if it does
only contain two colors (as any BW format of PBM, GIF and PNG can
do, and JPEG can't). In case tesseract OCR does not operate on
PBM files directly, convert them into something it can handle
better, like TIFF or maybe PNG; you can use

% convert .-530.pbm 530.png
% convert .-531.pbm 531.png

manually (as you will only process two files) and then run the OCR
process on them.

Note that pdfimages can also output color images (if they are color
images in the source), e. g. I found .-000.ppm (PPM format) with
a diagram in Good Ideas, Through the Looking Glass by N. Wirth.
I'm not sure if there could also directly be PNG or EPS files
in a PDF file...



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


send-pr Submission Times

2012-10-13 Thread Doug Hardie
I sent a PR using send-pr earlier today.  However, after having sent it and 
received a line that said it was submitted, I realized I didn't include my 
email address.  Somehow I completely overlooked that. I have been waiting for 
it to show up in the on-line indexes, but it hasn't so far.  How long does that 
process normally take?  I am wondering if it was just dropped because of the 
lack of the email address.


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: send-pr Submission Times

2012-10-13 Thread Greg Larkin
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat Oct 13 20:04:28 2012, Doug Hardie wrote:
 I sent a PR using send-pr earlier today.  However, after having sent it and 
 received a line that said it was submitted, I realized I didn't include my 
 email address.  Somehow I completely overlooked that. I have been waiting for 
 it to show up in the on-line indexes, but it hasn't so far.  How long does 
 that process normally take?  I am wondering if it was just dropped because of 
 the lack of the email address.


 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

Hi Doug,

Check your outbound mail queue, and perhaps it is stuck in there.
Also, look at the send-pr man page and the use of the PR_FORM
environment variable.  You can create a default send-pr template, save
it as a file and put the filename in PR_FORM.  The next time you start
send-pr, your PR will be populated from the template.

Hope that helps,
Greg

- --
Greg Larkin

http://www.FreeBSD.org/   - The Power To Serve
http://www.sourcehosting.net/ - Ready. Set. Code.
http://twitter.com/cpucycle/  - Follow you, follow me
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (Darwin)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB6A5AACgkQ0sRouByUApAPjQCfVlcDm8iK4zxbLnrL2VZgataI
NLMAnAmobdYvs42FyPQpYSMe8rgRMfve
=0SIr
-END PGP SIGNATURE-

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: editing pdf files

2012-10-13 Thread Gary Kline
On Sat, Oct 13, 2012 at 11:15:36PM +0200, Polytropon wrote:
 On Sat, 13 Oct 2012 13:47:01 -0700, Gary Kline wrote:
  On Sat, Oct 13, 2012 at 01:19:07PM +0200, Polytropon wrote:
   On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote:
   
   The disassembling can be done with 
   
 % pdfimages source.pdf .
   
   Then the files can be edited whatever tool you like, e. g. Gimp.
   They often come out in PBM format.
   
  
  
  A qstn I should have asked last time.  this book is a history or
  bio of richland county, ohio::  in type, it's like 650 or more
  pages.  SO: Is pdfimages going to spit of 6t50 files?  as noted 
  in last email, only  a couple of these images are of any interest 
 
 Depends on what actually _is_ in the PDF file. If every page is
 represented as a picture, 650 pictures will be created. If it
 contains text _and_ images, the images will be output, if will
 _only_ output the images, with no real realtion to where they
 have been placed in the text. As suggested by the name pdfimages
 it takes the images from the PDF file. :-)
 
 The easiest way to check for possible text is to install xpdf
 which brings the binary pdftotext (if I remember correctly that
 this tool is in _that_ package). You can then use it like this:
 
   % pdftotext source.pdf
 
 It will create source.txt with all actual text (but of course
 without _any_ formatting except line breaks and ^L page breaks),
 including page numbers. But hey, it's pure ASCII text suitable
 for further processing. :-)
 
 Run pdftotext without parameters for a short summary of its
 parameters; man pdftotext is also provided.
 


Well, then my original instincts were right.  I ran the 
pdftotext file.pdf and nothing but the page numbers were 
there.   rats.  oh-well, at least I can type in byhhand what 
I want:)


 
 -- 
 Polytropon
 Magdeburg, Germany
 Happy FreeBSD user since 4.0
 Andra moi ennepe, Mousa, ...

-- 
 Gary Kline  kl...@thought.org  http://www.thought.org  Public Service Unix
  Twenty-six years of service to the Unix community.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


poudriere amassing fetch errors

2012-10-13 Thread Christopher J. Ruwe
Hello,

for some time I have noticed that poudriere bulk build amass fetch
errors, i.e., the corresponding distfile(s) cannot be fetched by the
build jail and I have to fetch these manually.

Does anybody know a fix to this unnerving condition?

Cheers,
-- 
Christopher J. Ruwe
TZ: GMT + 2h
GnuPG/GPG:  0xE8DE2C14

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD9 - Fresh install (2)

2012-10-13 Thread Denise H. G.

On 2012/10/14 at 01:59, Jos Chrispijn j...@webrz.net wrote:
 
 When setting up my 1TB harddisk for FreeBSD 9.0, I have some questions
 about partioning:
 I think of creating two partitions of 5Gb; one for the standard
 FreeBSD file layour and a second one with a /backup slice on it.
 Does this make sense?
 
 BR,
 Jos Chrispijn
  

If you intend to use ZFS, then backup would not be very difficult. I've
just tried backing up my ZFS filesystem onto an external USB harddrive
with just a few steps. 

-- 
10 PRINT HELLO, WORLD
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org