subject:"Re\: can i split a pdf file\?"

Re: can i split a pdf file?

2009-01-26 Thread Jonathan McKeown

On Monday 26 January 2009 09:17:05 Andrew Robinson wrote:
> Message: 2
>
> > Date: Sun, 25 Jan 2009 20:20:51 -0500
> > From: Chuck Robey 
> > Subject: Re: can i split a pdf file?
> > To: FreeBSD Mailing List 
> > Message-ID: <497d0ff3.6090...@telenix.org>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA1
> >
> > Charlie Kester wrote:
> > > On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:
> > >> Is there a way to split a large pdf file into smaller [ say 1MB ]
> > >> chunks?  Or are there open source tools out there that i can
> > >> build?
[various suggestions including pdfmerge and psnup from ports]

Alternatively, if you have a reasonably complete TeX installation (I'm still 
using teTeX), check whether you have texexec installed - which can extract 
pages from PDFs.

Jonathan
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Andrew Robinson

Message: 2
> Date: Sun, 25 Jan 2009 20:20:51 -0500
> From: Chuck Robey 
> Subject: Re: can i split a pdf file?
> To: FreeBSD Mailing List 
> Message-ID: <497d0ff3.6090...@telenix.org>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Charlie Kester wrote:
> > On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:
> >>
> >> Is there a way to split a large pdf file into smaller [ say 1MB ]
> >> chunks?  Or are there open source tools out there that i can build?  
> > 
> > pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
> > files, but it doesn't seem to be in the FreeBSD ports system.
> > 
> > There is a pdfmerge in /usr/ports/print, but no pdfsplit.
> > 
> 
> It's a very junky way to do it (but the only way I know), use pdf2ps
> to convert the pdf to postscript, then you stand at least a good
> chance of doing the split, which many utilities allow.  You could even
> do it graphically via gv.  The problem with this (and the reason it
> might well fail anyhow) is because some things that pdfs do aren't
> implemented in any standard postscript level I ever heard of.  It
> depends how many of the more recent extensions to pdf are being used.
> I've done this, *sometimes*.
> 
> Because the pdf spec is fully published, it might one day allow
> someone to write a splitter, but because the spec is SO enormous,
> maybe they won't, either.  Actually, that's a really good notion ... I
> need to give it some thought.

It's not quite the same thing, but pdfnup from the

/usr/ports/print/pdfjam 

package allows page selections from the contributing pdfs.

http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/firth/software/pdfjam

Andrew
 
> > -- Charlie
> > 
> > 
> > ___
> > freebsd-questions@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> > To unsubscribe, send any mail to
> > "freebsd-questions-unsubscr...@freebsd.org"
> 
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v2.0.9 (FreeBSD)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iEYEARECAAYFAkl9D/MACgkQz62J6PPcoOnxIQCgg+Suf4NpK8TXTNbYZIW0BCrR
> fKYAn3ljinZw9s1fPG39IMpblVNg0H+N
> =mGhJ
> -END PGP SIGNATURE-


-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/ 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Polytropon

On Mon, 26 Jan 2009 23:39:06 +0100, cpghost  wrote:
> Those PDFs are usually scanned,
> and the scanner software (usually on Windows) assembles all screenshots
> into a PDF of images.

Handy for printing, but not for OCR postprocessing.

> That's what you find on the Net.

On the Web. :-)

> This is not such a bad idea, esp. when it comes to technical textbooks,
> which usually contain a lot of diagrams, formulae, tables etc...; since
> an OCR software that would be able to reverse all this into LaTeX and
> EPS figures has yet to be programmed (that's a difficult task).

As I've already mentioned, scanning the characters is only one part.
Your example of diagrams and formulas is good to illustrate this.
And because LaTeX is the only professional typesetting system
(and no, "Word" isn't such a tool), it would be really great to
have a tool pdf2tex which would get the characters of the text,
typeset them as in the original (paragraphing, hyphenation etc.),
input embedded pictures as pictures (of course), re-create
formulas so the result would run through pdf-LaTeX and
produce an improved version of the source PDF file.

But that's a task for the next generation of mankind. :-)

> Some PDFs encode the fonts
> in a special section, and then use text (sometimes compressed
> or encrypted), which refers to those fonts. In such a case, you
> could extract the pure text from the PDF.

It's worth mentioning that if the original text has characters
(represented in the additionally stored fonts) that have special
accents or orientations (non-english languages usually), the
target system needs to support them, which it usually does through
the means of UTF-8.

> Other PDFs simply encode the book as a set of bitmaps (see above);
> and then your only chance is to find an OCR software that would not
> only be able to recognize the characters in the bitmaps, but also
> to cope with those Fraktur- or other exotic fonts.

Yes, das Doytsh Uberfrucktoor makes everything unreadable. :-)

It gets even more complicated with hand-written books...

> Some OCR programs
> are interactive and trainable, so that you can say: this is an 'S',
> and that is a 'T'..., but AFAIK, there's no free and open source
> OCR program with this capability (yet).

Wow, never heared of this concept, but really intelligent solution.
If this really works, it still has the "disadvantage" of needing
much time for training the program, and postprocessing.

It's easier to \usepackage[german]{uberfraktur} to make the text
unreadable again. :-)

-- 
Polytropon
>From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Polytropon

On Mon, 26 Jan 2009 14:51:14 -0800, Gary Kline  wrote:
> Still,
>   before I get back to the Last few pages of my thesis, maybe I'll
>   try feeding parts of my most vanilla image-PDF file to an
>   opensource OCR program.  I'm pretty sure there are a couple in
>   ports.  IIRC, though, the images have to be jpegs of tiffs or the
>   like.  If anybody knows, please give me a shout out!

The best idea is to use a format that does not have artifacts
due to image compression through DCT or similar algorithms,
read: "real black-white pictures" (1 bit color). JPEG is not
such a format, you can see this by magnifying the surrounding
of text: it is gray and looks "dusty".

TIFF, GIF and PNG surely are better formats for feeding images
into an OCR processor.

(Background: Long time ago, I knew a man who did electronics
and printed circuit boards. In order to save hard disk space,
he converted his 1-bit BMP images of the schematics and the
PCB layout to JPEG format - instead of just zipping, raring
or arjing them. He was very unhappy to see them coming out
of the printer "so dirty, partially unreadable" then allthough
it was a high quality office class laser printer. And when
he took the PCBs out of the acid bath, their previously
photochemical treated surface looked strange, had holes in
the copper, ready to be thrown away. This man was very upset
when he was told about DCT and artifacts. Later on, he used
GIF images and turned happy again.)

-- 
Polytropon
>From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Polytropon

On Mon, 26 Jan 2009 14:06:23 -0800, Gary Kline  wrote:
>   So what kind of moron is going to photograph pages --or maybe just
>   get-screenshot-of-this-page" and upload it? 

The PDF serves as a container for pictural images in this context.
Another idea would be to have separate image files, one file per
page, that you could view at with your favourite image viewer.

The advantage of the PDF container is that you can easily print
a bunch of pages (or, a book).

>  Or a Real question:
>   I read an online pdf of "The Art of War" from the 1880's [?], and
>   it was in an old-English or olden-Deutsch type font.  In PDF.  i
>   have other p.d. texts in pdf and am wondering in there is some
>   sort of scanner than can take a book-length script and create a
>   pdf file.  Anybody know?  

It's very complicated to handle old fonts using OCR techniques.
It's even quite complicated with today's standard fonts. Allthough
there are (usually expensive) OCR programs with good algorithms,
most documents need some work afterwards. It's not only about
correcting mis-recognized characters, you have to handle hyphenation
and paragraph typesetting as well.

I know that there are scanners that can process a bunch op paper
(sheets of paper) through an automatic feeder, then scan them and
finally have a PDF file ready for FTP download. But there's no
OCR involved, of course.

> I got a bunch of ^L bytes and nothing
>   else. 

The Ctrl-L (^L) is the page break character (FF = form feed). The
rest of the file then contains images that are not transformable
into characters.

> Now I'm looking at the file with od -c and, yup, it's and
>   image. The parts inbetween pages are in ASCII.  Do you know what
>   "MediaBox" is?

An image container maybe? So every page contains of a "MediaBox"
container holding one image.

>   At least the web article was not an image! 

Don't mind, I know "important" web pages where the text content 
actually IS an image, and of course theres no alt= or longdesc=
parameter because they're for weenies. :-)

-- 
Polytropon
>From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Gary Kline

On Mon, Jan 26, 2009 at 01:36:48PM -0800, Charlie Kester wrote:
> On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote:
> >On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline  wrote:
> >>Thanks, Gents,
> >>
> >>But according to one smallish pdf file that I send to a web based
> >>tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
> >>speech program) couldn't decode it.
> >
> >This is a typical problem with "poorly engineered" PDFs where the
> >author puts in the text as images (you'll see this stupidity across
> >the Web, too).
> 
> In most cases where I've seen this, it's because they had scanned an
> actual printed document.  Many old, out-of-print books are being made
> newly available this way, so I'm not inclined to complain.
> 
> Unfortunately, OCR software still isn't reliable enough (or, if
> reliable, cheap enough) to convert these scanned images to actual text.


You're probably right about the cost/performance idea.  Still,
before I get back to the Last few pages of my thesis, maybe I'll
try feeding parts of my most vanilla image-PDF file to an
opensource OCR program.  I'm pretty sure there are a couple in
ports.  IIRC, though, the images have to be jpegs of tiffs or the
like.  If anybody knows, please give me a shout out!

gary

-- 
 Gary Kline  kl...@thought.org  http://www.thought.org  Public Service Unix
http://jottings.thought.org   http://transfinite.thought.org
The 2.23a release of Jottings: http://jottings.thought.org/index.php

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread cpghost

On Mon, Jan 26, 2009 at 02:06:23PM -0800, Gary Kline wrote:
> On Mon, Jan 26, 2009 at 09:16:23AM +0100, Polytropon wrote:
> > On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline  wrote:
> > >   Thanks, Gents,
> > > 
> > >   But according to one smallish pdf file that I send to a web based
> > >   tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
> > >   speech program) couldn't decode it.
> > 
> > This is a typical problem with "poorly engineered" PDFs where the
> > author puts in the text as images (you'll see this stupidity across
> > the Web, too).
> 
> 
>   So what kind of moron is going to photograph pages --or maybe just
>   get-screenshot-of-this-page" and upload it?

It happens quite frequently nowadays. Those PDFs are usually scanned,
and the scanner software (usually on Windows) assembles all screenshots
into a PDF of images. That's what you find on the Net.

This is not such a bad idea, esp. when it comes to technical textbooks,
which usually contain a lot of diagrams, formulae, tables etc...; since
an OCR software that would be able to reverse all this into LaTeX and
EPS figures has yet to be programmed (that's a difficult task).

>   Or a Real question:
>   I read an online pdf of "The Art of War" from the 1880's [?], and
>   it was in an old-English or olden-Deutsch type font.  In PDF.  i
>   have other p.d. texts in pdf and am wondering in there is some
>   sort of scanner than can take a book-length script and create a
>   pdf file.  Anybody know?  

It all depends how the PDF is created. Some PDFs encode the fonts
in a special section, and then use text (sometimes compressed
or encrypted), which refers to those fonts. In such a case, you
could extract the pure text from the PDF.

Other PDFs simply encode the book as a set of bitmaps (see above);
and then your only chance is to find an OCR software that would not
only be able to recognize the characters in the bitmaps, but also
to cope with those Fraktur- or other exotic fonts. Some OCR programs
are interactive and trainable, so that you can say: this is an 'S',
and that is a 'T'..., but AFAIK, there's no free and open source
OCR program with this capability (yet).

> > A good tool to check if the PDF file can be (audibly) read is the
> > use of the tool pdftotext from the port xpdf.
> > 
> > % pdftotext bla.pdf && less bla.txt
> > 
> > Then, even the FF speech plugin should work correctly - as long as
> > the PDF file contains decodable text. If it's just a bunch of images,
> > well, what are we expecting, hm? FF-speech: "You see a pretty image of
> > some text..." :-)
> 
>   Yeah, that's about right!  I got a bunch of ^L bytes and nothing
>   else.  Now I'm looking at the file with od -c and, yup, it's and
>   image. The parts inbetween pages are in ASCII.  Do you know what
>   "MediaBox" is?

So it's a set of images. There's not much you could do about it.

Oh, you can still try to extract the images from the PDF by using the
program 'pdfimages' (part of the graphics/xpdf port); and look at them
individually with an image processor (Gimp etc...). Then run an OCR
program on those images. Try graphics/gocr for example. But it would
still be tedious, to say the least.

>   At least the web article was not an image!  Google had it both in
>   PDF and HTML.
> 
>   gary

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Gary Kline

On Mon, Jan 26, 2009 at 09:16:23AM +0100, Polytropon wrote:
> On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline  wrote:
> > Thanks, Gents,
> > 
> > But according to one smallish pdf file that I send to a web based
> > tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
> > speech program) couldn't decode it.
> 
> This is a typical problem with "poorly engineered" PDFs where the
> author puts in the text as images (you'll see this stupidity across
> the Web, too).

So what kind of moron is going to photograph pages --or maybe just
get-screenshot-of-this-page" and upload it?   Or a Real question:
I read an online pdf of "The Art of War" from the 1880's [?], and
it was in an old-English or olden-Deutsch type font.  In PDF.  i
have other p.d. texts in pdf and am wondering in there is some
sort of scanner than can take a book-length script and create a
pdf file.  Anybody know?  

> 
> A good tool to check if the PDF file can be (audibly) read is the
> use of the tool pdftotext from the port xpdf.
> 
>   % pdftotext bla.pdf && less bla.txt
> 
> Then, even the FF speech plugin should work correctly - as long as
> the PDF file contains decodable text. If it's just a bunch of images,
> well, what are we expecting, hm? FF-speech: "You see a pretty image of
> some text..." :-)
> 

Yeah, that's about right!  I got a bunch of ^L bytes and nothing
else.  Now I'm looking at the file with od -c and, yup, it's and
image. The parts inbetween pages are in ASCII.  Do you know what
"MediaBox" is?

At least the web article was not an image!  Google had it both in
PDF and HTML.

gary

> 
> 
> -- 
> Polytropon
> From Magdeburg, Germany
> Happy FreeBSD user since 4.0
> Andra moi ennepe, Mousa, ...

-- 
 Gary Kline  kl...@thought.org  http://www.thought.org  Public Service Unix
http://jottings.thought.org   http://transfinite.thought.org
The 2.23a release of Jottings: http://jottings.thought.org/index.php

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Charlie Kester


On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote:

On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline  wrote:

Thanks, Gents,

But according to one smallish pdf file that I send to a web based
	tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
	speech program) couldn't decode it.


This is a typical problem with "poorly engineered" PDFs where the
author puts in the text as images (you'll see this stupidity across
the Web, too).


In most cases where I've seen this, it's because they had scanned an
actual printed document.  Many old, out-of-print books are being made
newly available this way, so I'm not inclined to complain.

Unfortunately, OCR software still isn't reliable enough (or, if
reliable, cheap enough) to convert these scanned images to actual text.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Roland Smith

On Sun, Jan 25, 2009 at 04:18:26PM -0800, Gary Kline wrote:
>   Folks,
> 
>   Is there a way to split a large pdf file into smaller [ say 1MB ]
>   chunks?  Or are there open source tools out there that i can
>   build?  

Ghostscript (when built with the pdfwrite driver) will copy pages from a PDF:

gs -DNOPAUSE -sDEVICE=pdfwrite -dFirstPage= -dLastPage= \
-sOutputFile=  -c quit >/dev/null 2>&1

Where  and  are page numbers, and  and 
are the output and original filename respectively.

Roland
-- 
R.F.Smith   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgplvmCyDnbtC.pgp
Description: PGP signature

Re: can i split a pdf file?

2009-01-26 Thread Polytropon

On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline  wrote:
>   Thanks, Gents,
> 
>   But according to one smallish pdf file that I send to a web based
>   tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
>   speech program) couldn't decode it.

This is a typical problem with "poorly engineered" PDFs where the
author puts in the text as images (you'll see this stupidity across
the Web, too).

A good tool to check if the PDF file can be (audibly) read is the
use of the tool pdftotext from the port xpdf.

% pdftotext bla.pdf && less bla.txt

Then, even the FF speech plugin should work correctly - as long as
the PDF file contains decodable text. If it's just a bunch of images,
well, what are we expecting, hm? FF-speech: "You see a pretty image of
some text..." :-)

-- 
Polytropon
>From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-26 Thread Gary Kline

On Sun, Jan 25, 2009 at 08:20:51PM -0500, Chuck Robey wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Charlie Kester wrote:
> > On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:
> >>
> >> Is there a way to split a large pdf file into smaller [ say 1MB ]
> >> chunks?  Or are there open source tools out there that i can build?  
> > 
> > pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
> > files, but it doesn't seem to be in the FreeBSD ports system.
> > 
> > There is a pdfmerge in /usr/ports/print, but no pdfsplit.
> > 
> 
> It's a very junky way to do it (but the only way I know), use pdf2ps to 
> convert
> the pdf to postscript, then you stand at least a good chance of doing the 
> split,
> which many utilities allow.  You could even do it graphically via gv.  The
> problem with this (and the reason it might well fail anyhow) is because some
> things that pdfs do aren't implemented in any standard postscript level I ever
> heard of.  It depends how many of the more recent extensions to pdf are being
> used.  I've done this, *sometimes*.
> 
> Because the pdf spec is fully published, it might one day allow someone to 
> write
> a splitter, but because the spec is SO enormous, maybe they won't, either.
> Actually, that's a really good notion ... I need to give it some thought.
> 
> > -- Charlie
> > 
> > 

Thanks, Gents,

But according to one smallish pdf file that I send to a web based
tool, it was not a real pdf.  Or, more accurately, it (the pdf to 
speech program) couldn't decode it.  I'll play around with this
more tomorrow.  The problem with a lot of this electronic paper
is that the lines are sequeezed together.  Makes scanning them
that much more difficult.  Last month I read a book [book-book,
from the library!] with more ~1.5 spaces between lines, and even
tho the font was small, no problem in reading the entire text.  

((FWIW: I'll find the URL of a piece on Hegelian ethics --PDF--
and see if the firefox speech site can grok that!))

gary



-- 
 Gary Kline  kl...@thought.org  http://www.thought.org  Public Service Unix
http://jottings.thought.org   http://transfinite.thought.org
The 2.23a release of Jottings: http://jottings.thought.org/index.php

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-25 Thread Chuck Robey

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Charlie Kester wrote:
> On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:
>>
>> Is there a way to split a large pdf file into smaller [ say 1MB ]
>> chunks?  Or are there open source tools out there that i can build?  
> 
> pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
> files, but it doesn't seem to be in the FreeBSD ports system.
> 
> There is a pdfmerge in /usr/ports/print, but no pdfsplit.
> 

It's a very junky way to do it (but the only way I know), use pdf2ps to convert
the pdf to postscript, then you stand at least a good chance of doing the split,
which many utilities allow.  You could even do it graphically via gv.  The
problem with this (and the reason it might well fail anyhow) is because some
things that pdfs do aren't implemented in any standard postscript level I ever
heard of.  It depends how many of the more recent extensions to pdf are being
used.  I've done this, *sometimes*.

Because the pdf spec is fully published, it might one day allow someone to write
a splitter, but because the spec is SO enormous, maybe they won't, either.
Actually, that's a really good notion ... I need to give it some thought.

> -- Charlie
> 
> 
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "freebsd-questions-unsubscr...@freebsd.org"

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkl9D/MACgkQz62J6PPcoOnxIQCgg+Suf4NpK8TXTNbYZIW0BCrR
fKYAn3ljinZw9s1fPG39IMpblVNg0H+N
=mGhJ
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-25 Thread Charlie Kester


On Sun 25 Jan 2009 at 16:51:56 PST Charlie Kester wrote:

On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:


Is there a way to split a large pdf file into smaller [ say 1MB ]
	chunks?  Or are there open source tools out there that i can build?  


pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
files, but it doesn't seem to be in the FreeBSD ports system.



Here's a suite of commandline tools for manipulating pdf's, in case you
don't want a gui:

http://multivalent.sourceforge.net/Tools/

This one uses Java, like pdfsam and pdftk.  Like pdfsam, it doesn't seem
to be in the ports tree.

-- Charlie
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-25 Thread cpghost

On Mon, Jan 26, 2009 at 01:37:08AM +0100, Wojciech Puchar wrote:
> > because, well, they aren't PDF files anymore. ;-)
> > For this, you'd prefer to split the PDF file after
> > N pages. You may want to investigate print/pdftk:
> >
> >> From /usr/ports/print/pdftk/pkg-descr:
> >
> >  If PDF is electronic paper, then pdftk is an electronic staple-remover,
> >  hole-punch, binder, secret-decoder-ring, and X-Ray-glasses.
> >  Pdftk is a simple tool for doing everyday things with PDF documents.
> >  Keep one in the top drawer of your desktop and use it to:
> 
> nice tool. thanks

Thanks. Though I prefer your solution (via mpage). pdftk
looks a bit too heavy for such a simple task. ;-)

Cheers,
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-25 Thread Charlie Kester


On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote:


Is there a way to split a large pdf file into smaller [ say 1MB ]
	chunks?  Or are there open source tools out there that i can build?  


pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf
files, but it doesn't seem to be in the FreeBSD ports system.

There is a pdfmerge in /usr/ports/print, but no pdfsplit.

-- Charlie


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-25 Thread Wojciech Puchar


because, well, they aren't PDF files anymore. ;-)
For this, you'd prefer to split the PDF file after
N pages. You may want to investigate print/pdftk:


From /usr/ports/print/pdftk/pkg-descr:


 If PDF is electronic paper, then pdftk is an electronic staple-remover,
 hole-punch, binder, secret-decoder-ring, and X-Ray-glasses.
 Pdftk is a simple tool for doing everyday things with PDF documents.
 Keep one in the top drawer of your desktop and use it to:


nice tool. thanks
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-25 Thread Wojciech Puchar


Folks,

Is there a way to split a large pdf file into smaller [ say 1MB ]
chunks?  Or are there open source tools out there that i can
build?

as every other file. use split.

or you meant splitting to separate pdf by some pages?

convert to ps (pdf2ps)
then use mpage to extract pages
then make pdf back
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

2009-01-25 Thread cpghost

On Sun, Jan 25, 2009 at 04:18:26PM -0800, Gary Kline wrote:
>   Folks,
> 
>   Is there a way to split a large pdf file into smaller [ say 1MB ]
>   chunks?  Or are there open source tools out there that i can
>   build?  
>   
>   thanks in advance,
> 
>   gary

To split the file, use split(1):

  $ split -b 1M file.pdf file-chunk

See "man split".

But you won't be able to view the chunks separately,
because, well, they aren't PDF files anymore. ;-)
For this, you'd prefer to split the PDF file after
N pages. You may want to investigate print/pdftk:

>From /usr/ports/print/pdftk/pkg-descr:

  If PDF is electronic paper, then pdftk is an electronic staple-remover,
  hole-punch, binder, secret-decoder-ring, and X-Ray-glasses.
  Pdftk is a simple tool for doing everyday things with PDF documents.
  Keep one in the top drawer of your desktop and use it to:

  Merge PDF Documents
  Split PDF Pages into a New Document
  Decrypt Input as Necessary (Password Required)
  Encrypt Output as Desired
  Burst a PDF Document into Single Pages
  Report on PDF Metrics, including Metadata and Bookmarks
  Uncompress and Re-Compress Page Streams
  Repair Corrupted PDF (Where Possible)

  Pdftk is also an example of how to use a library of Java classes
  in a stand-alone C++ program. Specifically, it demonstrates how GCJ and CNI
  allow C++ code to use iText's (itext-paulo) Java classes.

  WWW: http://www.accesspdf.com/pdftk/

There are also other less heavy-weight programs to extract
pages and page-ranges from a PDF and PostScript file...

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

Re: can i split a pdf file?

19 matches

Site Navigation

Mail list logo

Footer information