Re: can i split a pdf file?
On Monday 26 January 2009 09:17:05 Andrew Robinson wrote: > Message: 2 > > > Date: Sun, 25 Jan 2009 20:20:51 -0500 > > From: Chuck Robey > > Subject: Re: can i split a pdf file? > > To: FreeBSD Mailing List > > Message-ID: <497d0ff3.6090...@telenix.org> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > -BEGIN PGP SIGNED MESSAGE- > > Hash: SHA1 > > > > Charlie Kester wrote: > > > On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote: > > >> Is there a way to split a large pdf file into smaller [ say 1MB ] > > >> chunks? Or are there open source tools out there that i can > > >> build? [various suggestions including pdfmerge and psnup from ports] Alternatively, if you have a reasonably complete TeX installation (I'm still using teTeX), check whether you have texexec installed - which can extract pages from PDFs. Jonathan ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
Message: 2 > Date: Sun, 25 Jan 2009 20:20:51 -0500 > From: Chuck Robey > Subject: Re: can i split a pdf file? > To: FreeBSD Mailing List > Message-ID: <497d0ff3.6090...@telenix.org> > Content-Type: text/plain; charset=ISO-8859-1 > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Charlie Kester wrote: > > On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote: > >> > >> Is there a way to split a large pdf file into smaller [ say 1MB ] > >> chunks? Or are there open source tools out there that i can build? > > > > pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf > > files, but it doesn't seem to be in the FreeBSD ports system. > > > > There is a pdfmerge in /usr/ports/print, but no pdfsplit. > > > > It's a very junky way to do it (but the only way I know), use pdf2ps > to convert the pdf to postscript, then you stand at least a good > chance of doing the split, which many utilities allow. You could even > do it graphically via gv. The problem with this (and the reason it > might well fail anyhow) is because some things that pdfs do aren't > implemented in any standard postscript level I ever heard of. It > depends how many of the more recent extensions to pdf are being used. > I've done this, *sometimes*. > > Because the pdf spec is fully published, it might one day allow > someone to write a splitter, but because the spec is SO enormous, > maybe they won't, either. Actually, that's a really good notion ... I > need to give it some thought. It's not quite the same thing, but pdfnup from the /usr/ports/print/pdfjam package allows page selections from the contributing pdfs. http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic/firth/software/pdfjam Andrew > > -- Charlie > > > > > > ___ > > freebsd-questions@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > > To unsubscribe, send any mail to > > "freebsd-questions-unsubscr...@freebsd.org" > > -BEGIN PGP SIGNATURE- > Version: GnuPG v2.0.9 (FreeBSD) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkl9D/MACgkQz62J6PPcoOnxIQCgg+Suf4NpK8TXTNbYZIW0BCrR > fKYAn3ljinZw9s1fPG39IMpblVNg0H+N > =mGhJ > -END PGP SIGNATURE- -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Mon, 26 Jan 2009 23:39:06 +0100, cpghost wrote: > Those PDFs are usually scanned, > and the scanner software (usually on Windows) assembles all screenshots > into a PDF of images. Handy for printing, but not for OCR postprocessing. > That's what you find on the Net. On the Web. :-) > This is not such a bad idea, esp. when it comes to technical textbooks, > which usually contain a lot of diagrams, formulae, tables etc...; since > an OCR software that would be able to reverse all this into LaTeX and > EPS figures has yet to be programmed (that's a difficult task). As I've already mentioned, scanning the characters is only one part. Your example of diagrams and formulas is good to illustrate this. And because LaTeX is the only professional typesetting system (and no, "Word" isn't such a tool), it would be really great to have a tool pdf2tex which would get the characters of the text, typeset them as in the original (paragraphing, hyphenation etc.), input embedded pictures as pictures (of course), re-create formulas so the result would run through pdf-LaTeX and produce an improved version of the source PDF file. But that's a task for the next generation of mankind. :-) > Some PDFs encode the fonts > in a special section, and then use text (sometimes compressed > or encrypted), which refers to those fonts. In such a case, you > could extract the pure text from the PDF. It's worth mentioning that if the original text has characters (represented in the additionally stored fonts) that have special accents or orientations (non-english languages usually), the target system needs to support them, which it usually does through the means of UTF-8. > Other PDFs simply encode the book as a set of bitmaps (see above); > and then your only chance is to find an OCR software that would not > only be able to recognize the characters in the bitmaps, but also > to cope with those Fraktur- or other exotic fonts. Yes, das Doytsh Uberfrucktoor makes everything unreadable. :-) It gets even more complicated with hand-written books... > Some OCR programs > are interactive and trainable, so that you can say: this is an 'S', > and that is a 'T'..., but AFAIK, there's no free and open source > OCR program with this capability (yet). Wow, never heared of this concept, but really intelligent solution. If this really works, it still has the "disadvantage" of needing much time for training the program, and postprocessing. It's easier to \usepackage[german]{uberfraktur} to make the text unreadable again. :-) -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Mon, 26 Jan 2009 14:51:14 -0800, Gary Kline wrote: > Still, > before I get back to the Last few pages of my thesis, maybe I'll > try feeding parts of my most vanilla image-PDF file to an > opensource OCR program. I'm pretty sure there are a couple in > ports. IIRC, though, the images have to be jpegs of tiffs or the > like. If anybody knows, please give me a shout out! The best idea is to use a format that does not have artifacts due to image compression through DCT or similar algorithms, read: "real black-white pictures" (1 bit color). JPEG is not such a format, you can see this by magnifying the surrounding of text: it is gray and looks "dusty". TIFF, GIF and PNG surely are better formats for feeding images into an OCR processor. (Background: Long time ago, I knew a man who did electronics and printed circuit boards. In order to save hard disk space, he converted his 1-bit BMP images of the schematics and the PCB layout to JPEG format - instead of just zipping, raring or arjing them. He was very unhappy to see them coming out of the printer "so dirty, partially unreadable" then allthough it was a high quality office class laser printer. And when he took the PCBs out of the acid bath, their previously photochemical treated surface looked strange, had holes in the copper, ready to be thrown away. This man was very upset when he was told about DCT and artifacts. Later on, he used GIF images and turned happy again.) -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Mon, 26 Jan 2009 14:06:23 -0800, Gary Kline wrote: > So what kind of moron is going to photograph pages --or maybe just > get-screenshot-of-this-page" and upload it? The PDF serves as a container for pictural images in this context. Another idea would be to have separate image files, one file per page, that you could view at with your favourite image viewer. The advantage of the PDF container is that you can easily print a bunch of pages (or, a book). > Or a Real question: > I read an online pdf of "The Art of War" from the 1880's [?], and > it was in an old-English or olden-Deutsch type font. In PDF. i > have other p.d. texts in pdf and am wondering in there is some > sort of scanner than can take a book-length script and create a > pdf file. Anybody know? It's very complicated to handle old fonts using OCR techniques. It's even quite complicated with today's standard fonts. Allthough there are (usually expensive) OCR programs with good algorithms, most documents need some work afterwards. It's not only about correcting mis-recognized characters, you have to handle hyphenation and paragraph typesetting as well. I know that there are scanners that can process a bunch op paper (sheets of paper) through an automatic feeder, then scan them and finally have a PDF file ready for FTP download. But there's no OCR involved, of course. > I got a bunch of ^L bytes and nothing > else. The Ctrl-L (^L) is the page break character (FF = form feed). The rest of the file then contains images that are not transformable into characters. > Now I'm looking at the file with od -c and, yup, it's and > image. The parts inbetween pages are in ASCII. Do you know what > "MediaBox" is? An image container maybe? So every page contains of a "MediaBox" container holding one image. > At least the web article was not an image! Don't mind, I know "important" web pages where the text content actually IS an image, and of course theres no alt= or longdesc= parameter because they're for weenies. :-) -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Mon, Jan 26, 2009 at 01:36:48PM -0800, Charlie Kester wrote: > On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote: > >On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline wrote: > >>Thanks, Gents, > >> > >>But according to one smallish pdf file that I send to a web based > >>tool, it was not a real pdf. Or, more accurately, it (the pdf to > >>speech program) couldn't decode it. > > > >This is a typical problem with "poorly engineered" PDFs where the > >author puts in the text as images (you'll see this stupidity across > >the Web, too). > > In most cases where I've seen this, it's because they had scanned an > actual printed document. Many old, out-of-print books are being made > newly available this way, so I'm not inclined to complain. > > Unfortunately, OCR software still isn't reliable enough (or, if > reliable, cheap enough) to convert these scanned images to actual text. You're probably right about the cost/performance idea. Still, before I get back to the Last few pages of my thesis, maybe I'll try feeding parts of my most vanilla image-PDF file to an opensource OCR program. I'm pretty sure there are a couple in ports. IIRC, though, the images have to be jpegs of tiffs or the like. If anybody knows, please give me a shout out! gary -- Gary Kline kl...@thought.org http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org The 2.23a release of Jottings: http://jottings.thought.org/index.php ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Mon, Jan 26, 2009 at 02:06:23PM -0800, Gary Kline wrote: > On Mon, Jan 26, 2009 at 09:16:23AM +0100, Polytropon wrote: > > On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline wrote: > > > Thanks, Gents, > > > > > > But according to one smallish pdf file that I send to a web based > > > tool, it was not a real pdf. Or, more accurately, it (the pdf to > > > speech program) couldn't decode it. > > > > This is a typical problem with "poorly engineered" PDFs where the > > author puts in the text as images (you'll see this stupidity across > > the Web, too). > > > So what kind of moron is going to photograph pages --or maybe just > get-screenshot-of-this-page" and upload it? It happens quite frequently nowadays. Those PDFs are usually scanned, and the scanner software (usually on Windows) assembles all screenshots into a PDF of images. That's what you find on the Net. This is not such a bad idea, esp. when it comes to technical textbooks, which usually contain a lot of diagrams, formulae, tables etc...; since an OCR software that would be able to reverse all this into LaTeX and EPS figures has yet to be programmed (that's a difficult task). > Or a Real question: > I read an online pdf of "The Art of War" from the 1880's [?], and > it was in an old-English or olden-Deutsch type font. In PDF. i > have other p.d. texts in pdf and am wondering in there is some > sort of scanner than can take a book-length script and create a > pdf file. Anybody know? It all depends how the PDF is created. Some PDFs encode the fonts in a special section, and then use text (sometimes compressed or encrypted), which refers to those fonts. In such a case, you could extract the pure text from the PDF. Other PDFs simply encode the book as a set of bitmaps (see above); and then your only chance is to find an OCR software that would not only be able to recognize the characters in the bitmaps, but also to cope with those Fraktur- or other exotic fonts. Some OCR programs are interactive and trainable, so that you can say: this is an 'S', and that is a 'T'..., but AFAIK, there's no free and open source OCR program with this capability (yet). > > A good tool to check if the PDF file can be (audibly) read is the > > use of the tool pdftotext from the port xpdf. > > > > % pdftotext bla.pdf && less bla.txt > > > > Then, even the FF speech plugin should work correctly - as long as > > the PDF file contains decodable text. If it's just a bunch of images, > > well, what are we expecting, hm? FF-speech: "You see a pretty image of > > some text..." :-) > > Yeah, that's about right! I got a bunch of ^L bytes and nothing > else. Now I'm looking at the file with od -c and, yup, it's and > image. The parts inbetween pages are in ASCII. Do you know what > "MediaBox" is? So it's a set of images. There's not much you could do about it. Oh, you can still try to extract the images from the PDF by using the program 'pdfimages' (part of the graphics/xpdf port); and look at them individually with an image processor (Gimp etc...). Then run an OCR program on those images. Try graphics/gocr for example. But it would still be tedious, to say the least. > At least the web article was not an image! Google had it both in > PDF and HTML. > > gary -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Mon, Jan 26, 2009 at 09:16:23AM +0100, Polytropon wrote: > On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline wrote: > > Thanks, Gents, > > > > But according to one smallish pdf file that I send to a web based > > tool, it was not a real pdf. Or, more accurately, it (the pdf to > > speech program) couldn't decode it. > > This is a typical problem with "poorly engineered" PDFs where the > author puts in the text as images (you'll see this stupidity across > the Web, too). So what kind of moron is going to photograph pages --or maybe just get-screenshot-of-this-page" and upload it? Or a Real question: I read an online pdf of "The Art of War" from the 1880's [?], and it was in an old-English or olden-Deutsch type font. In PDF. i have other p.d. texts in pdf and am wondering in there is some sort of scanner than can take a book-length script and create a pdf file. Anybody know? > > A good tool to check if the PDF file can be (audibly) read is the > use of the tool pdftotext from the port xpdf. > > % pdftotext bla.pdf && less bla.txt > > Then, even the FF speech plugin should work correctly - as long as > the PDF file contains decodable text. If it's just a bunch of images, > well, what are we expecting, hm? FF-speech: "You see a pretty image of > some text..." :-) > Yeah, that's about right! I got a bunch of ^L bytes and nothing else. Now I'm looking at the file with od -c and, yup, it's and image. The parts inbetween pages are in ASCII. Do you know what "MediaBox" is? At least the web article was not an image! Google had it both in PDF and HTML. gary > > > -- > Polytropon > From Magdeburg, Germany > Happy FreeBSD user since 4.0 > Andra moi ennepe, Mousa, ... -- Gary Kline kl...@thought.org http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org The 2.23a release of Jottings: http://jottings.thought.org/index.php ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Mon 26 Jan 2009 at 00:16:23 PST Polytropon wrote: On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline wrote: Thanks, Gents, But according to one smallish pdf file that I send to a web based tool, it was not a real pdf. Or, more accurately, it (the pdf to speech program) couldn't decode it. This is a typical problem with "poorly engineered" PDFs where the author puts in the text as images (you'll see this stupidity across the Web, too). In most cases where I've seen this, it's because they had scanned an actual printed document. Many old, out-of-print books are being made newly available this way, so I'm not inclined to complain. Unfortunately, OCR software still isn't reliable enough (or, if reliable, cheap enough) to convert these scanned images to actual text. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Sun, Jan 25, 2009 at 04:18:26PM -0800, Gary Kline wrote: > Folks, > > Is there a way to split a large pdf file into smaller [ say 1MB ] > chunks? Or are there open source tools out there that i can > build? Ghostscript (when built with the pdfwrite driver) will copy pages from a PDF: gs -DNOPAUSE -sDEVICE=pdfwrite -dFirstPage= -dLastPage= \ -sOutputFile= -c quit >/dev/null 2>&1 Where and are page numbers, and and are the output and original filename respectively. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgplvmCyDnbtC.pgp Description: PGP signature
Re: can i split a pdf file?
On Mon, 26 Jan 2009 00:06:18 -0800, Gary Kline wrote: > Thanks, Gents, > > But according to one smallish pdf file that I send to a web based > tool, it was not a real pdf. Or, more accurately, it (the pdf to > speech program) couldn't decode it. This is a typical problem with "poorly engineered" PDFs where the author puts in the text as images (you'll see this stupidity across the Web, too). A good tool to check if the PDF file can be (audibly) read is the use of the tool pdftotext from the port xpdf. % pdftotext bla.pdf && less bla.txt Then, even the FF speech plugin should work correctly - as long as the PDF file contains decodable text. If it's just a bunch of images, well, what are we expecting, hm? FF-speech: "You see a pretty image of some text..." :-) -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Sun, Jan 25, 2009 at 08:20:51PM -0500, Chuck Robey wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Charlie Kester wrote: > > On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote: > >> > >> Is there a way to split a large pdf file into smaller [ say 1MB ] > >> chunks? Or are there open source tools out there that i can build? > > > > pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf > > files, but it doesn't seem to be in the FreeBSD ports system. > > > > There is a pdfmerge in /usr/ports/print, but no pdfsplit. > > > > It's a very junky way to do it (but the only way I know), use pdf2ps to > convert > the pdf to postscript, then you stand at least a good chance of doing the > split, > which many utilities allow. You could even do it graphically via gv. The > problem with this (and the reason it might well fail anyhow) is because some > things that pdfs do aren't implemented in any standard postscript level I ever > heard of. It depends how many of the more recent extensions to pdf are being > used. I've done this, *sometimes*. > > Because the pdf spec is fully published, it might one day allow someone to > write > a splitter, but because the spec is SO enormous, maybe they won't, either. > Actually, that's a really good notion ... I need to give it some thought. > > > -- Charlie > > > > Thanks, Gents, But according to one smallish pdf file that I send to a web based tool, it was not a real pdf. Or, more accurately, it (the pdf to speech program) couldn't decode it. I'll play around with this more tomorrow. The problem with a lot of this electronic paper is that the lines are sequeezed together. Makes scanning them that much more difficult. Last month I read a book [book-book, from the library!] with more ~1.5 spaces between lines, and even tho the font was small, no problem in reading the entire text. ((FWIW: I'll find the URL of a piece on Hegelian ethics --PDF-- and see if the firefox speech site can grok that!)) gary -- Gary Kline kl...@thought.org http://www.thought.org Public Service Unix http://jottings.thought.org http://transfinite.thought.org The 2.23a release of Jottings: http://jottings.thought.org/index.php ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Charlie Kester wrote: > On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote: >> >> Is there a way to split a large pdf file into smaller [ say 1MB ] >> chunks? Or are there open source tools out there that i can build? > > pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf > files, but it doesn't seem to be in the FreeBSD ports system. > > There is a pdfmerge in /usr/ports/print, but no pdfsplit. > It's a very junky way to do it (but the only way I know), use pdf2ps to convert the pdf to postscript, then you stand at least a good chance of doing the split, which many utilities allow. You could even do it graphically via gv. The problem with this (and the reason it might well fail anyhow) is because some things that pdfs do aren't implemented in any standard postscript level I ever heard of. It depends how many of the more recent extensions to pdf are being used. I've done this, *sometimes*. Because the pdf spec is fully published, it might one day allow someone to write a splitter, but because the spec is SO enormous, maybe they won't, either. Actually, that's a really good notion ... I need to give it some thought. > -- Charlie > > > ___ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscr...@freebsd.org" -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkl9D/MACgkQz62J6PPcoOnxIQCgg+Suf4NpK8TXTNbYZIW0BCrR fKYAn3ljinZw9s1fPG39IMpblVNg0H+N =mGhJ -END PGP SIGNATURE- ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Sun 25 Jan 2009 at 16:51:56 PST Charlie Kester wrote: On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote: Is there a way to split a large pdf file into smaller [ say 1MB ] chunks? Or are there open source tools out there that i can build? pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf files, but it doesn't seem to be in the FreeBSD ports system. Here's a suite of commandline tools for manipulating pdf's, in case you don't want a gui: http://multivalent.sourceforge.net/Tools/ This one uses Java, like pdfsam and pdftk. Like pdfsam, it doesn't seem to be in the ports tree. -- Charlie ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Mon, Jan 26, 2009 at 01:37:08AM +0100, Wojciech Puchar wrote: > > because, well, they aren't PDF files anymore. ;-) > > For this, you'd prefer to split the PDF file after > > N pages. You may want to investigate print/pdftk: > > > >> From /usr/ports/print/pdftk/pkg-descr: > > > > If PDF is electronic paper, then pdftk is an electronic staple-remover, > > hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. > > Pdftk is a simple tool for doing everyday things with PDF documents. > > Keep one in the top drawer of your desktop and use it to: > > nice tool. thanks Thanks. Though I prefer your solution (via mpage). pdftk looks a bit too heavy for such a simple task. ;-) Cheers, -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Sun 25 Jan 2009 at 16:18:26 PST Gary Kline wrote: Is there a way to split a large pdf file into smaller [ say 1MB ] chunks? Or are there open source tools out there that i can build? pdfsam ( http://www.pdfsam.org/ ) does both splits and merges of pdf files, but it doesn't seem to be in the FreeBSD ports system. There is a pdfmerge in /usr/ports/print, but no pdfsplit. -- Charlie ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
because, well, they aren't PDF files anymore. ;-) For this, you'd prefer to split the PDF file after N pages. You may want to investigate print/pdftk: From /usr/ports/print/pdftk/pkg-descr: If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a simple tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to: nice tool. thanks ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
Folks, Is there a way to split a large pdf file into smaller [ say 1MB ] chunks? Or are there open source tools out there that i can build? as every other file. use split. or you meant splitting to separate pdf by some pages? convert to ps (pdf2ps) then use mpage to extract pages then make pdf back ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: can i split a pdf file?
On Sun, Jan 25, 2009 at 04:18:26PM -0800, Gary Kline wrote: > Folks, > > Is there a way to split a large pdf file into smaller [ say 1MB ] > chunks? Or are there open source tools out there that i can > build? > > thanks in advance, > > gary To split the file, use split(1): $ split -b 1M file.pdf file-chunk See "man split". But you won't be able to view the chunks separately, because, well, they aren't PDF files anymore. ;-) For this, you'd prefer to split the PDF file after N pages. You may want to investigate print/pdftk: >From /usr/ports/print/pdftk/pkg-descr: If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a simple tool for doing everyday things with PDF documents. Keep one in the top drawer of your desktop and use it to: Merge PDF Documents Split PDF Pages into a New Document Decrypt Input as Necessary (Password Required) Encrypt Output as Desired Burst a PDF Document into Single Pages Report on PDF Metrics, including Metadata and Bookmarks Uncompress and Re-Compress Page Streams Repair Corrupted PDF (Where Possible) Pdftk is also an example of how to use a library of Java classes in a stand-alone C++ program. Specifically, it demonstrates how GCJ and CNI allow C++ code to use iText's (itext-paulo) Java classes. WWW: http://www.accesspdf.com/pdftk/ There are also other less heavy-weight programs to extract pages and page-ranges from a PDF and PostScript file... -cpghost. -- Cordula's Web. http://www.cordula.ws/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"