Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
Hi All, Thank you to those who sent the suggestions. Much appreciated. We now have lists of options to ponder and investigate. Please do not hesitate to add more idea or suggestions! thanks, ranti. On Wed, Aug 3, 2011 at 7:36 PM, Ranti Junus ranti.ju...@gmail.com wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. thanks, ranti. -- Bulk mail. Postage paid. -- Bulk mail. Postage paid.
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
One method is to dispense with PDF and just view the scanned pages online as images or OCR'd text or point the user to a directory with the scans for the document. He then only needs an image viewer using a lot less of his machines memory. Large PDF's also cause problems in the viewing computer. I was reviewing someones 25mb PDF the other day and it peaked at 3.3 gig memory use, which on a 2.5gig memory box meant it went into swap and slowed to a crawl. The viewer used there was evince. I scan to jpg and only produce a PDF if nagged http://www.collection.archivist.info/archive/manuals/IS44_Tektronix_602_display_unit/ As I serve from home and the upload is on the slow side individual pages helps there too. And when in a good mood I finish off a document thus http://www.collection.archivist.info/searchv13.php?searchstr=lucas+tp1 where all pages are web viewable. Been too lazy to write a page to page link on the page view so far (need a round tuit). Dave Caroline
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
Why not let someone else, such as the Google, do the heavy lifting for you: https://docs.google.com/viewer ~Richard. On 4 August 2011 07:39, Dave Caroline dave.thearchiv...@gmail.com wrote: One method is to dispense with PDF and just view the scanned pages online as images or OCR'd text or point the user to a directory with the scans for the document. He then only needs an image viewer using a lot less of his machines memory. Large PDF's also cause problems in the viewing computer. I was reviewing someones 25mb PDF the other day and it peaked at 3.3 gig memory use, which on a 2.5gig memory box meant it went into swap and slowed to a crawl. The viewer used there was evince. I scan to jpg and only produce a PDF if nagged http://www.collection.archivist.info/archive/manuals/IS44_Tektronix_602_display_unit/ As I serve from home and the upload is on the slow side individual pages helps there too. And when in a good mood I finish off a document thus http://www.collection.archivist.info/searchv13.php?searchstr=lucas+tp1 where all pages are web viewable. Been too lazy to write a page to page link on the page view so far (need a round tuit). Dave Caroline -- Richard Wallis Technology Evangelist, Talis Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
My naive thought would be to run your files through a batch PDF compressor, something along the lines of: http://www.cvisiontech.com/products/general/pdfcompressor.html If the files are still too large, PDF splitters exist, giving folks the option of downloading a page/ section at a time. Best of luck, Cab Vinton, Director Sanbornton Public Library Sanbornton, NH Politeness and consideration for others is like investing pennies and getting dollars back. Thomas Sowell
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
I've thought about using JPEG page images instead of PDFs to serve our scanned newspapers, which also have sizes ranging upwards of 100MB+, with a link to download the PDF as a fallback for people who really want that. The downside is having to do the bulk conversion, manage the extra files, etc. Another option would be a flash frontend. Someone already mentioned Google, and I've also seen some use of issuu.com (our campus newspaper currently uses them). There are also options you could integrate into your own site, such as FlexPaper (http://flexpaper.devaldi.com/). You still have to upload and/or convert your files, but you retain a PDF-like display in the browser. -Esme -- Esme Cowles escow...@ucsd.edu A person, who is nice to you, but rude to the waiter, is not a nice person. (This is very important. Pay attention. It never fails.) -- Dave Barry On 08/3/2011, at 7:36 PM, Ranti Junus wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. thanks, ranti. -- Bulk mail. Postage paid.
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
No one has mentioned accessibility issues for those using screenreaders. JPEG would not work for them. Bill Drew -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cowles, Esme Sent: Thursday, August 04, 2011 8:45 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested I've thought about using JPEG page images instead of PDFs to serve our scanned newspapers, which also have sizes ranging upwards of 100MB+, with a link to download the PDF as a fallback for people who really want that. The downside is having to do the bulk conversion, manage the extra files, etc. Another option would be a flash frontend. Someone already mentioned Google, and I've also seen some use of issuu.com (our campus newspaper currently uses them). There are also options you could integrate into your own site, such as FlexPaper (http://flexpaper.devaldi.com/). You still have to upload and/or convert your files, but you retain a PDF-like display in the browser. -Esme -- Esme Cowles escow...@ucsd.edu A person, who is nice to you, but rude to the waiter, is not a nice person. (This is very important. Pay attention. It never fails.) -- Dave Barry On 08/3/2011, at 7:36 PM, Ranti Junus wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. thanks, ranti. -- Bulk mail. Postage paid.
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
Hello, For a project I just finished several scripts to generate pdfs from piles of tiffs. The process was: In a .htaccess file have not found urls rewritten to a script that passed the desired filename to it. The script would then build the pdf and 'print' it to the requester along with the correct mime type. It would also save it to the disk in the place the original request was made to. Such that a second request for the same file would just serve the file with no script processing. A simple batch script to delete all files older than X days in the pdf folder. You could do something similurlar and by using different urls and some unix pdf2pdf command processing make a bunch of urls that would serve different dpi version of the same file. like: http://foo.bar/pdf/big/101010.pdf http://foo.bar/pdf/small/101010.pdf http://foo.bar/pdf/tiny/101010.pdf http://foo.bar/pdf/verytiny/101010.pdf If you need any code samples or additional info I welcome your query. Aaron -- Aaron Addison Unix Administrator W. E. B. Du Bois Library UMass Amherst 413 577 2104 On Wed, 2011-08-03 at 22:51 -0400, Joe Hourcle wrote: On Aug 3, 2011, at 7:36 PM, Ranti Junus wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. I've been dealing with related issues for a few years, and if you have the file locally, it's generally not too difficult to have a CGI or similar that you can call that will do some sort of transformation on the fly. Unfortunately, what we've run into is that in some cases, in part because it tends to be used by people with slow connections, and for very large files, they'll keep restarting to the process, and because it's a generated on-the-fly, the webserver can't just pick up where it left off, so has to re-start the process. The alternative is to write it out to disk, and then let the webserver handle it as a normal file. Depending on how many of these you're dealing with, you may have to have something manage the scratch space and remove the generated files that haven't been viewed in some time. What I've been hoping to do is: 1. Assign URLs to all of the processed forms, of the format: http://server/processing/ID (where 'ID' includes some hashing in it, so it's not 10mil files in a directory) 2. Write a 404 handler for each processing type, so that should a file not exist in that directory, it will: (a) verify that the ID is valid, otherwise, return a 404. (b) check to see if the ID's being processed, otherwise, kick off a process for the file to be generated (c) return a 503 status. Unfortunately, my initial testing (years ago) suggested that no clients at the time properly handled 503 requests (effectively, try back in (x) minutes, and you give 'em a time) The alternative is to just basically sleep for a period of time, and then return the file once it's been generated ... but for ones that take some time (some of my processing might take hours, as the files that it needs as input are stored near-line, and we're at the mercy of a tape robot) ... You might also be able to sleep and then use one of the various 30x status codes, but I don't know what a client might do if you returned the same URL. (they might abort, to prevent looping) -Joe
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
Disclaimer: I helped write this software. You may want to look at our just-released Diva.js script. It can handle document images up to many gigabytes in size, in many different resolutions. The big advantage, though, is that the user only ever downloads the portion of the document that they are looking at so viewing is almost instant. We specifically designed it to work on slower network connections. It's in Javascript, so it runs in any modern web browser with no Flash or PDF plugin needed. It does require some server-side infrastructure to set up. We have a recent code4lib journal article that describes how it works and what is needed. Code4Lib article: http://journal.code4lib.org/articles/5418 Here is a very basic demo: http://ddmal.music.mcgill.ca/diva/demo/ (This book is ~4GB of images). And the code is available here: https://github.com/ddmaL/diva.js We have a new version coming out very soon that fixes some bugs and adds a contact sheet view for quickly scrolling through all the images. If you need any more info, please let me know and I would be happy to help. Cheers, -Andrew On 2011-08-04, at 8:45 AM, Cowles, Esme wrote: I've thought about using JPEG page images instead of PDFs to serve our scanned newspapers, which also have sizes ranging upwards of 100MB+, with a link to download the PDF as a fallback for people who really want that. The downside is having to do the bulk conversion, manage the extra files, etc. Another option would be a flash frontend. Someone already mentioned Google, and I've also seen some use of issuu.com (our campus newspaper currently uses them). There are also options you could integrate into your own site, such as FlexPaper (http://flexpaper.devaldi.com/). You still have to upload and/or convert your files, but you retain a PDF-like display in the browser. -Esme -- Esme Cowles escow...@ucsd.edu A person, who is nice to you, but rude to the waiter, is not a nice person. (This is very important. Pay attention. It never fails.) -- Dave Barry On 08/3/2011, at 7:36 PM, Ranti Junus wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. thanks, ranti. -- Bulk mail. Postage paid.
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
So I take it, this would need a fast connection between Google and your server, but would tolerate a slow connection between the user and Google? Cindy Harper, Systems Librarian Colgate University Libraries char...@colgate.edu 315-228-7363 On Thu, Aug 4, 2011 at 6:03 AM, Richard Wallis richard.wal...@talis.comwrote: Why not let someone else, such as the Google, do the heavy lifting for you: https://docs.google.com/viewer ~Richard. On 4 August 2011 07:39, Dave Caroline dave.thearchiv...@gmail.com wrote: One method is to dispense with PDF and just view the scanned pages online as images or OCR'd text or point the user to a directory with the scans for the document. He then only needs an image viewer using a lot less of his machines memory. Large PDF's also cause problems in the viewing computer. I was reviewing someones 25mb PDF the other day and it peaked at 3.3 gig memory use, which on a 2.5gig memory box meant it went into swap and slowed to a crawl. The viewer used there was evince. I scan to jpg and only produce a PDF if nagged http://www.collection.archivist.info/archive/manuals/IS44_Tektronix_602_display_unit/ As I serve from home and the upload is on the slow side individual pages helps there too. And when in a good mood I finish off a document thus http://www.collection.archivist.info/searchv13.php?searchstr=lucas+tp1 where all pages are web viewable. Been too lazy to write a page to page link on the page view so far (need a round tuit). Dave Caroline -- Richard Wallis Technology Evangelist, Talis Tel: +44 (0)7767 886 005 Linkedin: http://www.linkedin.com/in/richardwallis Skype: richard.wallis1 Twitter: @rjw IM: rjw3...@hotmail.com
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
An option to consider is using the Internet Archive's Book Reader: http://openlibrary.org/dev/docs/bookreader http://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-your-own-cluster/ You can use it locally with your own images - JPEG 2000, JPEG, TIFF and PNG are supported formats. You would probably need to extract the images from the PDFs if you don't have these already. You could use something like ImageMagick to do this. David Nind On 5 August 2011 00:45, Cowles, Esme escow...@ucsd.edu wrote: I've thought about using JPEG page images instead of PDFs to serve our scanned newspapers, which also have sizes ranging upwards of 100MB+, with a link to download the PDF as a fallback for people who really want that. The downside is having to do the bulk conversion, manage the extra files, etc. Another option would be a flash frontend. Someone already mentioned Google, and I've also seen some use of issuu.com (our campus newspaper currently uses them). There are also options you could integrate into your own site, such as FlexPaper (http://flexpaper.devaldi.com/). You still have to upload and/or convert your files, but you retain a PDF-like display in the browser. -Esme -- Esme Cowles escow...@ucsd.edu A person, who is nice to you, but rude to the waiter, is not a nice person. (This is very important. Pay attention. It never fails.) -- Dave Barry On 08/3/2011, at 7:36 PM, Ranti Junus wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. thanks, ranti. -- Bulk mail. Postage paid. -- David Nind __ david.n...@gmail.com PO Box 12367 Thorndon Wellington 6144 +64 4 9720600 (home) +64 210537847 (cellphone) +64 4 8906098 (work)
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
snip An option to consider is using the Internet Archive's Book Reader: http://openlibrary.org/dev/docs/bookreader http://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-your-own-cluster/ You can use it locally with your own images - JPEG 2000, JPEG, TIFF and PNG are supported formats. You would probably need to extract the images from the PDFs if you don't have these already. You could use something like ImageMagick to do this. /snip I think the brasiliana/Corisco repository has done quite a bit on combining the OL bookreader with PDFs on the fly: - https://github.com/brasiliana/Corisco/blob/master/docs/CoriscoDiagram.png - https://github.com/brasiliana/Corisco Eoghan On 4 August 2011 21:43, David Nind david.n...@gmail.com wrote: An option to consider is using the Internet Archive's Book Reader: http://openlibrary.org/dev/docs/bookreader http://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-your-own-cluster/ You can use it locally with your own images - JPEG 2000, JPEG, TIFF and PNG are supported formats. You would probably need to extract the images from the PDFs if you don't have these already. You could use something like ImageMagick to do this. David Nind On 5 August 2011 00:45, Cowles, Esme escow...@ucsd.edu wrote: I've thought about using JPEG page images instead of PDFs to serve our scanned newspapers, which also have sizes ranging upwards of 100MB+, with a link to download the PDF as a fallback for people who really want that. The downside is having to do the bulk conversion, manage the extra files, etc. Another option would be a flash frontend. Someone already mentioned Google, and I've also seen some use of issuu.com (our campus newspaper currently uses them). There are also options you could integrate into your own site, such as FlexPaper (http://flexpaper.devaldi.com/). You still have to upload and/or convert your files, but you retain a PDF-like display in the browser. -Esme -- Esme Cowles escow...@ucsd.edu A person, who is nice to you, but rude to the waiter, is not a nice person. (This is very important. Pay attention. It never fails.) -- Dave Barry On 08/3/2011, at 7:36 PM, Ranti Junus wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. thanks, ranti. -- Bulk mail. Postage paid. -- David Nind __ david.n...@gmail.com PO Box 12367 Thorndon Wellington 6144 +64 4 9720600 (home) +64 210537847 (cellphone) +64 4 8906098 (work)
[CODE4LIB] Apps to reduce large file on the fly when it's requested
Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. thanks, ranti. -- Bulk mail. Postage paid.
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
I agree that your client software should be nothing more than a link or button in the web browser. As for the server, it sounds akin to image servers that resize on the fly. I would probably just proxy requests to a script or cgi that compresses/converts the files, especially if you're not planning to get a lot of hits per second. If that's not robust enough, there are a number of results from a search for pdf server that might work for you. On Wed, Aug 3, 2011 at 7:36 PM, Ranti Junus ranti.ju...@gmail.com wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. thanks, ranti. -- Bulk mail. Postage paid.
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
On Aug 3, 2011, at 7:36 PM, Ranti Junus wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. I've been dealing with related issues for a few years, and if you have the file locally, it's generally not too difficult to have a CGI or similar that you can call that will do some sort of transformation on the fly. Unfortunately, what we've run into is that in some cases, in part because it tends to be used by people with slow connections, and for very large files, they'll keep restarting to the process, and because it's a generated on-the-fly, the webserver can't just pick up where it left off, so has to re-start the process. The alternative is to write it out to disk, and then let the webserver handle it as a normal file. Depending on how many of these you're dealing with, you may have to have something manage the scratch space and remove the generated files that haven't been viewed in some time. What I've been hoping to do is: 1. Assign URLs to all of the processed forms, of the format: http://server/processing/ID (where 'ID' includes some hashing in it, so it's not 10mil files in a directory) 2. Write a 404 handler for each processing type, so that should a file not exist in that directory, it will: (a) verify that the ID is valid, otherwise, return a 404. (b) check to see if the ID's being processed, otherwise, kick off a process for the file to be generated (c) return a 503 status. Unfortunately, my initial testing (years ago) suggested that no clients at the time properly handled 503 requests (effectively, try back in (x) minutes, and you give 'em a time) The alternative is to just basically sleep for a period of time, and then return the file once it's been generated ... but for ones that take some time (some of my processing might take hours, as the files that it needs as input are stored near-line, and we're at the mercy of a tape robot) ... You might also be able to sleep and then use one of the various 30x status codes, but I don't know what a client might do if you returned the same URL. (they might abort, to prevent looping) -Joe