Re: can i split a pdf file?

2009-01-26 Thread Gary Kline
On Sun, Jan 25, 2009 at 08:20:51PM -0500, Chuck Robey wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Charlie Kester wrote:
  On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:
 
  Is there a way to split a large pdf file into smaller [ say 1MB ]
  chunks?  Or are there open source tools out there that i can build?  
  
  pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
  files, but it doesn't seem to be in the FreeBSD ports system.
  
  There is a pdfmerge in /usr/ports/print, but no pdfsplit.
  
 
 It's a very junky way to do it (but the only way I know), use pdf2ps to 
 convert
 the pdf to postscript, then you stand at least a good chance of doing the 
 split,
 which many utilities allow.  You could even do it graphically via gv.  The
 problem with this (and the reason it might well fail anyhow) is because some
 things that pdfs do aren't implemented in any standard postscript level I ever
 heard of.  It depends how many of the more recent extensions to pdf are being
 used.  I've done this, *sometimes*.
 
 Because the pdf spec is fully published, it might one day allow someone to 
 write
 a splitter, but because the spec is SO enormous, maybe they won't, either.
 Actually, that's a really good notion ... I need to give it some thought.
 
  -- Charlie
  
  

Thanks, Gents,

But according to one smallish pdf file that I send to a web based
tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
speech program) couldn't decode it.  I'll play around with this
more tomorrow.  The problem with a lot of this electronic paper
is that the lines are sequeezed together.  Makes scanning them
that much more difficult.  Last month I read a book [book-book,
from the library!] with more ~1.5 spaces between lines, and even
tho the font was small, no problem in reading the entire text.  

((FWIW: I'll find the URL of a piece on Hegelian ethics --PDF--
and see if the firefox speech site can grok that!))

gary



-- 
 Gary Kline  kl...@thought.org  http://www.thought.org  Public Service Unix
http://jottings.thought.org   http://transfinite.thought.org
The 2.23a release of Jottings: http://jottings.thought.org/index.php

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Polytropon
On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline kl...@thought.org wrote:
   Thanks, Gents,
 
   But according to one smallish pdf file that I send to a web based
   tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
   speech program) couldn't decode it.

This is a typical problem with poorly engineered PDFs where the
author puts in the text as images (you'll see this stupidity across
the Web, too).

A good tool to check if the PDF file can be (audibly) read is the
use of the tool pdftotext from the port xpdf.

% pdftotext bla.pdf  less bla.txt

Then, even the FF speech plugin should work correctly - as long as
the PDF file contains decodable text. If it's just a bunch of images,
well, what are we expecting, hm? FF-speech: You see a pretty image of
some text... :-)



-- 
Polytropon
From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Roland Smith
On Sun, Jan 25, 2009 at 04:18:26PM -0800, Gary Kline wrote:
   Folks,
 
   Is there a way to split a large pdf file into smaller [ say 1MB ]
   chunks?  Or are there open source tools out there that i can
   build?  

Ghostscript (when built with the pdfwrite driver) will copy pages from a PDF:

gs -DNOPAUSE -sDEVICE=pdfwrite -dFirstPage=N -dLastPage=M \
-sOutputFile=outfile.pdf infile.pdf -c quit /dev/null 21

Where N and M are page numbers, and outfile.pdf and infile.pdf
are the output and original filename respectively.

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgplvmCyDnbtC.pgp
Description: PGP signature


Re: can i split a pdf file?

2009-01-26 Thread Charlie Kester

On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote:

On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline kl...@thought.org wrote:

Thanks, Gents,

But according to one smallish pdf file that I send to a web based
	tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
	speech program) couldn't decode it.


This is a typical problem with poorly engineered PDFs where the
author puts in the text as images (you'll see this stupidity across
the Web, too).


In most cases where I've seen this, it's because they had scanned an
actual printed document.  Many old, out-of-print books are being made
newly available this way, so I'm not inclined to complain.

Unfortunately, OCR software still isn't reliable enough (or, if
reliable, cheap enough) to convert these scanned images to actual text.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Gary Kline
On Mon, Jan 26, 2009 at 09:16:23AM +0100, Polytropon wrote:
 On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline kl...@thought.org wrote:
  Thanks, Gents,
  
  But according to one smallish pdf file that I send to a web based
  tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
  speech program) couldn't decode it.
 
 This is a typical problem with poorly engineered PDFs where the
 author puts in the text as images (you'll see this stupidity across
 the Web, too).


So what kind of moron is going to photograph pages --or maybe just
get-screenshot-of-this-page and upload it?   Or a Real question:
I read an online pdf of The Art of War from the 1880's [?], and
it was in an old-English or olden-Deutsch type font.  In PDF.  i
have other p.d. texts in pdf and am wondering in there is some
sort of scanner than can take a book-length script and create a
pdf file.  Anybody know?  


 
 A good tool to check if the PDF file can be (audibly) read is the
 use of the tool pdftotext from the port xpdf.
 
   % pdftotext bla.pdf  less bla.txt
 
 Then, even the FF speech plugin should work correctly - as long as
 the PDF file contains decodable text. If it's just a bunch of images,
 well, what are we expecting, hm? FF-speech: You see a pretty image of
 some text... :-)
 

Yeah, that's about right!  I got a bunch of ^L bytes and nothing
else.  Now I'm looking at the file with od -c and, yup, it's and
image. The parts inbetween pages are in ASCII.  Do you know what
MediaBox is?

At least the web article was not an image!  Google had it both in
PDF and HTML.

gary

 
 
 -- 
 Polytropon
 From Magdeburg, Germany
 Happy FreeBSD user since 4.0
 Andra moi ennepe, Mousa, ...

-- 
 Gary Kline  kl...@thought.org  http://www.thought.org  Public Service Unix
http://jottings.thought.org   http://transfinite.thought.org
The 2.23a release of Jottings: http://jottings.thought.org/index.php

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread cpghost
On Mon, Jan 26, 2009 at 02:06:23PM -0800, Gary Kline wrote:
 On Mon, Jan 26, 2009 at 09:16:23AM +0100, Polytropon wrote:
  On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline kl...@thought.org wrote:
 Thanks, Gents,
   
 But according to one smallish pdf file that I send to a web based
 tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
 speech program) couldn't decode it.
  
  This is a typical problem with poorly engineered PDFs where the
  author puts in the text as images (you'll see this stupidity across
  the Web, too).
 
 
   So what kind of moron is going to photograph pages --or maybe just
   get-screenshot-of-this-page and upload it?

It happens quite frequently nowadays. Those PDFs are usually scanned,
and the scanner software (usually on Windows) assembles all screenshots
into a PDF of images. That's what you find on the Net.

This is not such a bad idea, esp. when it comes to technical textbooks,
which usually contain a lot of diagrams, formulae, tables etc...; since
an OCR software that would be able to reverse all this into LaTeX and
EPS figures has yet to be programmed (that's a difficult task).

   Or a Real question:
   I read an online pdf of The Art of War from the 1880's [?], and
   it was in an old-English or olden-Deutsch type font.  In PDF.  i
   have other p.d. texts in pdf and am wondering in there is some
   sort of scanner than can take a book-length script and create a
   pdf file.  Anybody know?  

It all depends how the PDF is created. Some PDFs encode the fonts
in a special section, and then use text (sometimes compressed
or encrypted), which refers to those fonts. In such a case, you
could extract the pure text from the PDF.

Other PDFs simply encode the book as a set of bitmaps (see above);
and then your only chance is to find an OCR software that would not
only be able to recognize the characters in the bitmaps, but also
to cope with those Fraktur- or other exotic fonts. Some OCR programs
are interactive and trainable, so that you can say: this is an 'S',
and that is a 'T'..., but AFAIK, there's no free and open source
OCR program with this capability (yet).

  A good tool to check if the PDF file can be (audibly) read is the
  use of the tool pdftotext from the port xpdf.
  
  % pdftotext bla.pdf  less bla.txt
  
  Then, even the FF speech plugin should work correctly - as long as
  the PDF file contains decodable text. If it's just a bunch of images,
  well, what are we expecting, hm? FF-speech: You see a pretty image of
  some text... :-)
 
   Yeah, that's about right!  I got a bunch of ^L bytes and nothing
   else.  Now I'm looking at the file with od -c and, yup, it's and
   image. The parts inbetween pages are in ASCII.  Do you know what
   MediaBox is?

So it's a set of images. There's not much you could do about it.

Oh, you can still try to extract the images from the PDF by using the
program 'pdfimages' (part of the graphics/xpdf port); and look at them
individually with an image processor (Gimp etc...). Then run an OCR
program on those images. Try graphics/gocr for example. But it would
still be tedious, to say the least.

   At least the web article was not an image!  Google had it both in
   PDF and HTML.
 
   gary

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Gary Kline
On Mon, Jan 26, 2009 at 01:36:48PM -0800, Charlie Kester wrote:
 On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote:
 On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline kl...@thought.org wrote:
 Thanks, Gents,
 
 But according to one smallish pdf file that I send to a web based
 tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
 speech program) couldn't decode it.
 
 This is a typical problem with poorly engineered PDFs where the
 author puts in the text as images (you'll see this stupidity across
 the Web, too).
 
 In most cases where I've seen this, it's because they had scanned an
 actual printed document.  Many old, out-of-print books are being made
 newly available this way, so I'm not inclined to complain.
 
 Unfortunately, OCR software still isn't reliable enough (or, if
 reliable, cheap enough) to convert these scanned images to actual text.


You're probably right about the cost/performance idea.  Still,
before I get back to the Last few pages of my thesis, maybe I'll
try feeding parts of my most vanilla image-PDF file to an
opensource OCR program.  I'm pretty sure there are a couple in
ports.  IIRC, though, the images have to be jpegs of tiffs or the
like.  If anybody knows, please give me a shout out!

gary

-- 
 Gary Kline  kl...@thought.org  http://www.thought.org  Public Service Unix
http://jottings.thought.org   http://transfinite.thought.org
The 2.23a release of Jottings: http://jottings.thought.org/index.php

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Polytropon
On Mon, 26 Jan 2009 14:06:23 -0800, Gary Kline kl...@thought.org wrote:
   So what kind of moron is going to photograph pages --or maybe just
   get-screenshot-of-this-page and upload it? 

The PDF serves as a container for pictural images in this context.
Another idea would be to have separate image files, one file per
page, that you could view at with your favourite image viewer.

The advantage of the PDF container is that you can easily print
a bunch of pages (or, a book).



  Or a Real question:
   I read an online pdf of The Art of War from the 1880's [?], and
   it was in an old-English or olden-Deutsch type font.  In PDF.  i
   have other p.d. texts in pdf and am wondering in there is some
   sort of scanner than can take a book-length script and create a
   pdf file.  Anybody know?  

It's very complicated to handle old fonts using OCR techniques.
It's even quite complicated with today's standard fonts. Allthough
there are (usually expensive) OCR programs with good algorithms,
most documents need some work afterwards. It's not only about
correcting mis-recognized characters, you have to handle hyphenation
and paragraph typesetting as well.

I know that there are scanners that can process a bunch op paper
(sheets of paper) through an automatic feeder, then scan them and
finally have a PDF file ready for FTP download. But there's no
OCR involved, of course.


 I got a bunch of ^L bytes and nothing
   else. 

The Ctrl-L (^L) is the page break character (FF = form feed). The
rest of the file then contains images that are not transformable
into characters.



 Now I'm looking at the file with od -c and, yup, it's and
   image. The parts inbetween pages are in ASCII.  Do you know what
   MediaBox is?

An image container maybe? So every page contains of a MediaBox
container holding one image.



   At least the web article was not an image! 

Don't mind, I know important web pages where the text content 
actually IS an image, and of course theres no alt= or longdesc=
parameter because they're for weenies. :-)





-- 
Polytropon
From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Polytropon
On Mon, 26 Jan 2009 14:51:14 -0800, Gary Kline kl...@thought.org wrote:
 Still,
   before I get back to the Last few pages of my thesis, maybe I'll
   try feeding parts of my most vanilla image-PDF file to an
   opensource OCR program.  I'm pretty sure there are a couple in
   ports.  IIRC, though, the images have to be jpegs of tiffs or the
   like.  If anybody knows, please give me a shout out!

The best idea is to use a format that does not have artifacts
due to image compression through DCT or similar algorithms,
read: real black-white pictures (1 bit color). JPEG is not
such a format, you can see this by magnifying the surrounding
of text: it is gray and looks dusty.

TIFF, GIF and PNG surely are better formats for feeding images
into an OCR processor.

(Background: Long time ago, I knew a man who did electronics
and printed circuit boards. In order to save hard disk space,
he converted his 1-bit BMP images of the schematics and the
PCB layout to JPEG format - instead of just zipping, raring
or arjing them. He was very unhappy to see them coming out
of the printer so dirty, partially unreadable then allthough
it was a high quality office class laser printer. And when
he took the PCBs out of the acid bath, their previously
photochemical treated surface looked strange, had holes in
the copper, ready to be thrown away. This man was very upset
when he was told about DCT and artifacts. Later on, he used
GIF images and turned happy again.)




-- 
Polytropon
From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Polytropon
On Mon, 26 Jan 2009 23:39:06 +0100, cpghost cpgh...@cordula.ws wrote:
 Those PDFs are usually scanned,
 and the scanner software (usually on Windows) assembles all screenshots
 into a PDF of images.

Handy for printing, but not for OCR postprocessing.



 That's what you find on the Net.

On the Web. :-)



 This is not such a bad idea, esp. when it comes to technical textbooks,
 which usually contain a lot of diagrams, formulae, tables etc...; since
 an OCR software that would be able to reverse all this into LaTeX and
 EPS figures has yet to be programmed (that's a difficult task).

As I've already mentioned, scanning the characters is only one part.
Your example of diagrams and formulas is good to illustrate this.
And because LaTeX is the only professional typesetting system
(and no, Word isn't such a tool), it would be really great to
have a tool pdf2tex which would get the characters of the text,
typeset them as in the original (paragraphing, hyphenation etc.),
input embedded pictures as pictures (of course), re-create
formulas so the result would run through pdf-LaTeX and
produce an improved version of the source PDF file.

But that's a task for the next generation of mankind. :-)


 Some PDFs encode the fonts
 in a special section, and then use text (sometimes compressed
 or encrypted), which refers to those fonts. In such a case, you
 could extract the pure text from the PDF.

It's worth mentioning that if the original text has characters
(represented in the additionally stored fonts) that have special
accents or orientations (non-english languages usually), the
target system needs to support them, which it usually does through
the means of UTF-8.



 Other PDFs simply encode the book as a set of bitmaps (see above);
 and then your only chance is to find an OCR software that would not
 only be able to recognize the characters in the bitmaps, but also
 to cope with those Fraktur- or other exotic fonts.

Yes, das Doytsh Uberfrucktoor makes everything unreadable. :-)

It gets even more complicated with hand-written books...



 Some OCR programs
 are interactive and trainable, so that you can say: this is an 'S',
 and that is a 'T'..., but AFAIK, there's no free and open source
 OCR program with this capability (yet).

Wow, never heared of this concept, but really intelligent solution.
If this really works, it still has the disadvantage of needing
much time for training the program, and postprocessing.

It's easier to \usepackage[german]{uberfraktur} to make the text
unreadable again. :-)




-- 
Polytropon
From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Andrew Robinson
Message: 2
 Date: Sun, 25 Jan 2009 20:20:51 -0500
 From: Chuck Robey chu...@telenix.org
 Subject: Re: can i split a pdf file?
 To: FreeBSD Mailing List freebsd-questions@FreeBSD.ORG
 Message-ID: 497d0ff3.6090...@telenix.org
 Content-Type: text/plain; charset=ISO-8859-1
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Charlie Kester wrote:
  On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:
 
  Is there a way to split a large pdf file into smaller [ say 1MB ]
  chunks?  Or are there open source tools out there that i can build?  
  
  pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
  files, but it doesn't seem to be in the FreeBSD ports system.
  
  There is a pdfmerge in /usr/ports/print, but no pdfsplit.
  
 
 It's a very junky way to do it (but the only way I know), use pdf2ps
 to convert the pdf to postscript, then you stand at least a good
 chance of doing the split, which many utilities allow.  You could even
 do it graphically via gv.  The problem with this (and the reason it
 might well fail anyhow) is because some things that pdfs do aren't
 implemented in any standard postscript level I ever heard of.  It
 depends how many of the more recent extensions to pdf are being used.
 I've done this, *sometimes*.
 
 Because the pdf spec is fully published, it might one day allow
 someone to write a splitter, but because the spec is SO enormous,
 maybe they won't, either.  Actually, that's a really good notion ... I
 need to give it some thought.

It's not quite the same thing, but pdfnup from the

/usr/ports/print/pdfjam 

package allows page selections from the contributing pdfs.

http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/firth/software/pdfjam

Andrew
 
  -- Charlie
  
  
  ___
  freebsd-questions@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-questions
  To unsubscribe, send any mail to
  freebsd-questions-unsubscr...@freebsd.org
 
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2.0.9 (FreeBSD)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iEYEARECAAYFAkl9D/MACgkQz62J6PPcoOnxIQCgg+Suf4NpK8TXTNbYZIW0BCrR
 fKYAn3ljinZw9s1fPG39IMpblVNg0H+N
 =mGhJ
 -END PGP SIGNATURE-


-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/ 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-26 Thread Jonathan McKeown
On Monday 26 January 2009 09:17:05 Andrew Robinson wrote:
 Message: 2

  Date: Sun, 25 Jan 2009 20:20:51 -0500
  From: Chuck Robey chu...@telenix.org
  Subject: Re: can i split a pdf file?
  To: FreeBSD Mailing List freebsd-questions@FreeBSD.ORG
  Message-ID: 497d0ff3.6090...@telenix.org
  Content-Type: text/plain; charset=ISO-8859-1
 
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Charlie Kester wrote:
   On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:
   Is there a way to split a large pdf file into smaller [ say 1MB ]
   chunks?  Or are there open source tools out there that i can
   build?
[various suggestions including pdfmerge and psnup from ports]

Alternatively, if you have a reasonably complete TeX installation (I'm still 
using teTeX), check whether you have texexec installed - which can extract 
pages from PDFs.

Jonathan
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-25 Thread cpghost
On Sun, Jan 25, 2009 at 04:18:26PM -0800, Gary Kline wrote:
   Folks,
 
   Is there a way to split a large pdf file into smaller [ say 1MB ]
   chunks?  Or are there open source tools out there that i can
   build?  
   
   thanks in advance,
 
   gary

To split the file, use split(1):

  $ split -b 1M file.pdf file-chunk

See man split.

But you won't be able to view the chunks separately,
because, well, they aren't PDF files anymore. ;-)
For this, you'd prefer to split the PDF file after
N pages. You may want to investigate print/pdftk:

From /usr/ports/print/pdftk/pkg-descr:

  If PDF is electronic paper, then pdftk is an electronic staple-remover,
  hole-punch, binder, secret-decoder-ring, and X-Ray-glasses.
  Pdftk is a simple tool for doing everyday things with PDF documents.
  Keep one in the top drawer of your desktop and use it to:
  
  Merge PDF Documents
  Split PDF Pages into a New Document
  Decrypt Input as Necessary (Password Required)
  Encrypt Output as Desired
  Burst a PDF Document into Single Pages
  Report on PDF Metrics, including Metadata and Bookmarks
  Uncompress and Re-Compress Page Streams
  Repair Corrupted PDF (Where Possible)
  
  Pdftk is also an example of how to use a library of Java classes
  in a stand-alone C++ program. Specifically, it demonstrates how GCJ and CNI
  allow C++ code to use iText's (itext-paulo) Java classes.
  
  WWW: http://www.accesspdf.com/pdftk/

There are also other less heavy-weight programs to extract
pages and page-ranges from a PDF and PostScript file...

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-25 Thread Wojciech Puchar

Folks,

Is there a way to split a large pdf file into smaller [ say 1MB ]
chunks?  Or are there open source tools out there that i can
build?

as every other file. use split.

or you meant splitting to separate pdf by some pages?

convert to ps (pdf2ps)
then use mpage to extract pages
then make pdf back
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-25 Thread Wojciech Puchar

because, well, they aren't PDF files anymore. ;-)
For this, you'd prefer to split the PDF file after
N pages. You may want to investigate print/pdftk:


From /usr/ports/print/pdftk/pkg-descr:


 If PDF is electronic paper, then pdftk is an electronic staple-remover,
 hole-punch, binder, secret-decoder-ring, and X-Ray-glasses.
 Pdftk is a simple tool for doing everyday things with PDF documents.
 Keep one in the top drawer of your desktop and use it to:


nice tool. thanks
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-25 Thread Charlie Kester

On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:


Is there a way to split a large pdf file into smaller [ say 1MB ]
	chunks?  Or are there open source tools out there that i can build?  


pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
files, but it doesn't seem to be in the FreeBSD ports system.

There is a pdfmerge in /usr/ports/print, but no pdfsplit.

-- Charlie


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-25 Thread cpghost
On Mon, Jan 26, 2009 at 01:37:08AM +0100, Wojciech Puchar wrote:
  because, well, they aren't PDF files anymore. ;-)
  For this, you'd prefer to split the PDF file after
  N pages. You may want to investigate print/pdftk:
 
  From /usr/ports/print/pdftk/pkg-descr:
 
   If PDF is electronic paper, then pdftk is an electronic staple-remover,
   hole-punch, binder, secret-decoder-ring, and X-Ray-glasses.
   Pdftk is a simple tool for doing everyday things with PDF documents.
   Keep one in the top drawer of your desktop and use it to:
 
 nice tool. thanks

Thanks. Though I prefer your solution (via mpage). pdftk
looks a bit too heavy for such a simple task. ;-)

Cheers,
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-25 Thread Charlie Kester

On Sun 25 Jan 2009 at 16:51:56 PST Charlie Kester wrote:

On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:


Is there a way to split a large pdf file into smaller [ say 1MB ]
	chunks?  Or are there open source tools out there that i can build?  


pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
files, but it doesn't seem to be in the FreeBSD ports system.



Here's a suite of commandline tools for manipulating pdf's, in case you
don't want a gui:

http://multivalent.sourceforge.net/Tools/

This one uses Java, like pdfsam and pdftk.  Like pdfsam, it doesn't seem
to be in the ports tree.

-- Charlie
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: can i split a pdf file?

2009-01-25 Thread Chuck Robey
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Charlie Kester wrote:
 On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:

 Is there a way to split a large pdf file into smaller [ say 1MB ]
 chunks?  Or are there open source tools out there that i can build?  
 
 pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
 files, but it doesn't seem to be in the FreeBSD ports system.
 
 There is a pdfmerge in /usr/ports/print, but no pdfsplit.
 

It's a very junky way to do it (but the only way I know), use pdf2ps to convert
the pdf to postscript, then you stand at least a good chance of doing the split,
which many utilities allow.  You could even do it graphically via gv.  The
problem with this (and the reason it might well fail anyhow) is because some
things that pdfs do aren't implemented in any standard postscript level I ever
heard of.  It depends how many of the more recent extensions to pdf are being
used.  I've done this, *sometimes*.

Because the pdf spec is fully published, it might one day allow someone to write
a splitter, but because the spec is SO enormous, maybe they won't, either.
Actually, that's a really good notion ... I need to give it some thought.

 -- Charlie
 
 
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to
 freebsd-questions-unsubscr...@freebsd.org

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkl9D/MACgkQz62J6PPcoOnxIQCgg+Suf4NpK8TXTNbYZIW0BCrR
fKYAn3ljinZw9s1fPG39IMpblVNg0H+N
=mGhJ
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org