Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-05 Thread Ranti Junus
Hi All,

Thank you to those who sent the suggestions. Much appreciated. We now
have lists of options to ponder and investigate.

Please do not hesitate to add more idea or suggestions!


thanks,
ranti.

On Wed, Aug 3, 2011 at 7:36 PM, Ranti Junus ranti.ju...@gmail.com wrote:

 Dear All,
 My colleague came with this query and I hope some of you could give us some 
 ideas or suggestion:
 Our Digital Multimedia Center (DMC) scanning project can produce very large 
 PDF files. They will have PDFs that are about 25Mb and some may move into the 
 100Mb range. If we provide a link to a PDF of that large, a user may not want 
 to try to download it even though she really needs to see the information. In 
 the past, DMC has created a lower quality, smaller versions to the original 
 file to reduce the size. Some thoughts have been tossed around to reduce 
 the duplication or the work (e.g. no more creating the lower quality PDF 
 manually.)
 They are wondering if there is an application that we could point to the end 
 user, who might need it due to poor internet access, that if used will 
 simplify the very large file transfer for the end user. Basically:
 - a client software that tells the server to manipulate and reduce the file 
 on the fly
 - a server app that would to the actual manipulation of the file and then 
 deliver it to the end user.
 Personally, I'm not really sure about the client software part. It makes more 
 sense to me (from the user's perspective) that we provide a download the 
 smaller size of this large file link that would trigger the server-side apps 
 to manipulate the big file. However, we're all ears for any suggestions you 
 might have.

 thanks,
 ranti.

 --
 Bulk mail.  Postage paid.




--
Bulk mail.  Postage paid.


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Dave Caroline
One method is to dispense with PDF and just view the scanned pages online as
images or OCR'd text or point the user to a directory with the scans
for the document.
He then only needs an image viewer using a lot less of his machines memory.

Large PDF's also cause problems in the viewing computer. I was
reviewing someones
25mb PDF the other day and it peaked at 3.3 gig memory use, which on a 2.5gig
memory box meant it went into swap and slowed to a crawl.
The viewer used there was evince.

I scan to jpg and only produce a PDF if nagged
http://www.collection.archivist.info/archive/manuals/IS44_Tektronix_602_display_unit/

As I serve from home and the upload is on the slow side individual
pages helps there too.
And when in a good mood I finish off a document thus
http://www.collection.archivist.info/searchv13.php?searchstr=lucas+tp1
where all pages are web viewable. Been too lazy to write a page to
page link on the page
view so far (need a round tuit).

Dave Caroline


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Richard Wallis
Why not let someone else, such as the Google, do the heavy lifting for you:
https://docs.google.com/viewer

~Richard.

On 4 August 2011 07:39, Dave Caroline dave.thearchiv...@gmail.com wrote:

 One method is to dispense with PDF and just view the scanned pages online
 as
 images or OCR'd text or point the user to a directory with the scans
 for the document.
 He then only needs an image viewer using a lot less of his machines memory.

 Large PDF's also cause problems in the viewing computer. I was
 reviewing someones
 25mb PDF the other day and it peaked at 3.3 gig memory use, which on a
 2.5gig
 memory box meant it went into swap and slowed to a crawl.
 The viewer used there was evince.

 I scan to jpg and only produce a PDF if nagged

 http://www.collection.archivist.info/archive/manuals/IS44_Tektronix_602_display_unit/

 As I serve from home and the upload is on the slow side individual
 pages helps there too.
 And when in a good mood I finish off a document thus
 http://www.collection.archivist.info/searchv13.php?searchstr=lucas+tp1
 where all pages are web viewable. Been too lazy to write a page to
 page link on the page
 view so far (need a round tuit).

 Dave Caroline




-- 
Richard Wallis
Technology Evangelist, Talis
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: rjw3...@hotmail.com


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Cab Vinton
My naive thought would be to run your files through a batch PDF
compressor, something along the lines of:

http://www.cvisiontech.com/products/general/pdfcompressor.html

If the files are still too large, PDF splitters exist, giving folks
the option of downloading a page/ section at a time.

Best of luck,

Cab Vinton, Director
Sanbornton Public Library
Sanbornton, NH

Politeness and consideration for others is like investing pennies and
getting dollars back. Thomas Sowell


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Cowles, Esme
I've thought about using JPEG page images instead of PDFs to serve our scanned 
newspapers, which also have sizes ranging upwards of 100MB+, with a link to 
download the PDF as a fallback for people who really want that.  The downside 
is having to do the bulk conversion, manage the extra files, etc.

Another option would be a flash frontend.  Someone already mentioned Google, 
and I've also seen some use of issuu.com (our campus newspaper currently uses 
them).  There are also options you could integrate into your own site, such as 
FlexPaper (http://flexpaper.devaldi.com/).  You still have to upload and/or 
convert your files, but you retain a PDF-like display in the browser.

-Esme
--
Esme Cowles escow...@ucsd.edu

A person, who is nice to you, but rude to the waiter, is not a nice person.
 (This is very important. Pay attention. It never fails.)  -- Dave Barry

On 08/3/2011, at 7:36 PM, Ranti Junus wrote:

 Dear All,
 
 My colleague came with this query and I hope some of you could give us some
 ideas or suggestion:
 
 Our Digital Multimedia Center (DMC) scanning project can produce very large
 PDF files. They will have PDFs that are about 25Mb and some may move into
 the 100Mb range. If we provide a link to a PDF of that large, a user may not
 want to try to download it even though she really needs to see the
 information. In the past, DMC has created a lower quality, smaller versions
 to the original file to reduce the size. Some thoughts have been tossed
 around to reduce the duplication or the work (e.g. no more creating the
 lower quality PDF manually.)
 
 They are wondering if there is an application that we could point to the end
 user, who might need it due to poor internet access, that if used will
 simplify the very large file transfer for the end user. Basically:
 - a client software that tells the server to manipulate and reduce the file
 on the fly
 - a server app that would to the actual manipulation of the file and then
 deliver it to the end user.
 
 Personally, I'm not really sure about the client software part. It makes
 more sense to me (from the user's perspective) that we provide a download
 the smaller size of this large file link that would trigger the server-side
 apps to manipulate the big file. However, we're all ears for any suggestions
 you might have.
 
 
 thanks,
 ranti.
 
 
 -- 
 Bulk mail.  Postage paid.


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Wilfred Drew
No one has mentioned accessibility issues for those using screenreaders.  JPEG 
would not work for them.

Bill Drew

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cowles, 
Esme
Sent: Thursday, August 04, 2011 8:45 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

I've thought about using JPEG page images instead of PDFs to serve our scanned 
newspapers, which also have sizes ranging upwards of 100MB+, with a link to 
download the PDF as a fallback for people who really want that.  The downside 
is having to do the bulk conversion, manage the extra files, etc.

Another option would be a flash frontend.  Someone already mentioned Google, 
and I've also seen some use of issuu.com (our campus newspaper currently uses 
them).  There are also options you could integrate into your own site, such as 
FlexPaper (http://flexpaper.devaldi.com/).  You still have to upload and/or 
convert your files, but you retain a PDF-like display in the browser.

-Esme
--
Esme Cowles escow...@ucsd.edu

A person, who is nice to you, but rude to the waiter, is not a nice person.
 (This is very important. Pay attention. It never fails.)  -- Dave Barry

On 08/3/2011, at 7:36 PM, Ranti Junus wrote:

 Dear All,
 
 My colleague came with this query and I hope some of you could give us some
 ideas or suggestion:
 
 Our Digital Multimedia Center (DMC) scanning project can produce very large
 PDF files. They will have PDFs that are about 25Mb and some may move into
 the 100Mb range. If we provide a link to a PDF of that large, a user may not
 want to try to download it even though she really needs to see the
 information. In the past, DMC has created a lower quality, smaller versions
 to the original file to reduce the size. Some thoughts have been tossed
 around to reduce the duplication or the work (e.g. no more creating the
 lower quality PDF manually.)
 
 They are wondering if there is an application that we could point to the end
 user, who might need it due to poor internet access, that if used will
 simplify the very large file transfer for the end user. Basically:
 - a client software that tells the server to manipulate and reduce the file
 on the fly
 - a server app that would to the actual manipulation of the file and then
 deliver it to the end user.
 
 Personally, I'm not really sure about the client software part. It makes
 more sense to me (from the user's perspective) that we provide a download
 the smaller size of this large file link that would trigger the server-side
 apps to manipulate the big file. However, we're all ears for any suggestions
 you might have.
 
 
 thanks,
 ranti.
 
 
 -- 
 Bulk mail.  Postage paid.


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Aaron Addison
Hello,

   For a project I just finished several scripts to generate pdfs from
piles of tiffs.  The process was:

In a .htaccess file have not found urls rewritten to a script that
passed the desired filename to it.

The script would then build the pdf and 'print' it to the requester
along with the correct mime type. It would also save it to the disk in
the place the original request was made to. Such that a second request
for the same file would just serve the file with no script processing.

A simple batch script to delete all files older than X days in the pdf
folder.

You could do something similurlar and by using different urls and some
unix pdf2pdf command processing make a bunch of urls that would serve
different dpi version of the same file. like:

http://foo.bar/pdf/big/101010.pdf
http://foo.bar/pdf/small/101010.pdf
http://foo.bar/pdf/tiny/101010.pdf
http://foo.bar/pdf/verytiny/101010.pdf

If you need any code samples or additional info I welcome your query.

Aaron




-- 
Aaron Addison
Unix Administrator 
W. E. B. Du Bois Library UMass Amherst
413 577 2104



On Wed, 2011-08-03 at 22:51 -0400, Joe Hourcle wrote:
 On Aug 3, 2011, at 7:36 PM, Ranti Junus wrote:
 
  Dear All,
  
  My colleague came with this query and I hope some of you could give us some
  ideas or suggestion:
  
  Our Digital Multimedia Center (DMC) scanning project can produce very large
  PDF files. They will have PDFs that are about 25Mb and some may move into
  the 100Mb range. If we provide a link to a PDF of that large, a user may not
  want to try to download it even though she really needs to see the
  information. In the past, DMC has created a lower quality, smaller versions
  to the original file to reduce the size. Some thoughts have been tossed
  around to reduce the duplication or the work (e.g. no more creating the
  lower quality PDF manually.)
  
  They are wondering if there is an application that we could point to the end
  user, who might need it due to poor internet access, that if used will
  simplify the very large file transfer for the end user. Basically:
  - a client software that tells the server to manipulate and reduce the file
  on the fly
  - a server app that would to the actual manipulation of the file and then
  deliver it to the end user.
  
  Personally, I'm not really sure about the client software part. It makes
  more sense to me (from the user's perspective) that we provide a download
  the smaller size of this large file link that would trigger the server-side
  apps to manipulate the big file. However, we're all ears for any suggestions
  you might have.
 
 
 I've been dealing with related issues for a few years, and if you have
 the file locally, it's generally not too difficult to have a CGI or similar
 that you can call that will do some sort of transformation on the fly.
 
 Unfortunately, what we've run into is that in some cases, in part because
 it tends to be used by people with slow connections, and for very large
 files, they'll keep restarting to the process, and because it's a generated
 on-the-fly, the webserver can't just pick up where it left off, so has to
 re-start the process.
 
 The alternative is to write it out to disk, and then let the webserver
 handle it as a normal file.  Depending on how many of these you're
 dealing with, you may have to have something manage the scratch
 space and remove the generated files that haven't been viewed in
 some time.
 
 What I've been hoping to do is:
 
   1. Assign URLs to all of the processed forms, of the format:
   http://server/processing/ID
   (where 'ID' includes some hashing in it, so it's not 10mil 
 files in a directory)
 
   2. Write a 404 handler for each processing type, so that
   should a file not exist in that directory, it will:
   (a) verify that the ID is valid, otherwise, return a 404.
   (b) check to see if the ID's being processed, otherwise, kick
   off a process for the file to be generated
   (c) return a 503 status.
 
 Unfortunately, my initial testing (years ago) suggested that no
 clients at the time properly handled 503 requests (effectively,
 try back in (x) minutes, and you give 'em a time)
 
 The alternative is to just basically sleep for a period of time, and
 then return the file once it's been generated ... but for ones
 that take some time (some of my processing might take hours,
 as the files that it needs as input are stored near-line, and we're
 at the mercy of a tape robot)
 
 ...
 
 You might also be able to sleep and then use one of the various
 30x status codes, but I don't know what a client might do if you
 returned the same URL.  (they might abort, to prevent looping)
 
 -Joe


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Andrew Hankinson
Disclaimer: I helped write this software.

You may want to look at our just-released Diva.js script. It can handle 
document images up to many gigabytes in size, in many different resolutions. 
The big advantage, though, is that the user only ever downloads the portion of 
the document that they are looking at so viewing is almost instant. We 
specifically designed it to work on slower network connections. It's in 
Javascript, so it runs in any modern web browser with no Flash or PDF plugin 
needed.

It does require some server-side infrastructure to set up. We have a recent 
code4lib journal article that describes how it works and what is needed. 

Code4Lib article: http://journal.code4lib.org/articles/5418

Here is a very basic demo: http://ddmal.music.mcgill.ca/diva/demo/ (This book 
is ~4GB of images).

And the code is available here: https://github.com/ddmaL/diva.js

We have a new version coming out very soon that fixes some bugs and adds a 
contact sheet view for quickly scrolling through all the images.

If you need any more info, please let me know and I would be happy to help.

Cheers,
-Andrew

On 2011-08-04, at 8:45 AM, Cowles, Esme wrote:

 I've thought about using JPEG page images instead of PDFs to serve our 
 scanned newspapers, which also have sizes ranging upwards of 100MB+, with a 
 link to download the PDF as a fallback for people who really want that.  The 
 downside is having to do the bulk conversion, manage the extra files, etc.
 
 Another option would be a flash frontend.  Someone already mentioned Google, 
 and I've also seen some use of issuu.com (our campus newspaper currently uses 
 them).  There are also options you could integrate into your own site, such 
 as FlexPaper (http://flexpaper.devaldi.com/).  You still have to upload 
 and/or convert your files, but you retain a PDF-like display in the browser.
 
 -Esme
 --
 Esme Cowles escow...@ucsd.edu
 
 A person, who is nice to you, but rude to the waiter, is not a nice person.
 (This is very important. Pay attention. It never fails.)  -- Dave Barry
 
 On 08/3/2011, at 7:36 PM, Ranti Junus wrote:
 
 Dear All,
 
 My colleague came with this query and I hope some of you could give us some
 ideas or suggestion:
 
 Our Digital Multimedia Center (DMC) scanning project can produce very large
 PDF files. They will have PDFs that are about 25Mb and some may move into
 the 100Mb range. If we provide a link to a PDF of that large, a user may not
 want to try to download it even though she really needs to see the
 information. In the past, DMC has created a lower quality, smaller versions
 to the original file to reduce the size. Some thoughts have been tossed
 around to reduce the duplication or the work (e.g. no more creating the
 lower quality PDF manually.)
 
 They are wondering if there is an application that we could point to the end
 user, who might need it due to poor internet access, that if used will
 simplify the very large file transfer for the end user. Basically:
 - a client software that tells the server to manipulate and reduce the file
 on the fly
 - a server app that would to the actual manipulation of the file and then
 deliver it to the end user.
 
 Personally, I'm not really sure about the client software part. It makes
 more sense to me (from the user's perspective) that we provide a download
 the smaller size of this large file link that would trigger the server-side
 apps to manipulate the big file. However, we're all ears for any suggestions
 you might have.
 
 
 thanks,
 ranti.
 
 
 -- 
 Bulk mail.  Postage paid.


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Cindy Harper
So I take it, this would need a fast connection between Google and your
server, but would tolerate a slow connection between the user and Google?

Cindy Harper, Systems Librarian
Colgate University Libraries
char...@colgate.edu
315-228-7363



On Thu, Aug 4, 2011 at 6:03 AM, Richard Wallis richard.wal...@talis.comwrote:

 Why not let someone else, such as the Google, do the heavy lifting for you:
 https://docs.google.com/viewer

 ~Richard.

 On 4 August 2011 07:39, Dave Caroline dave.thearchiv...@gmail.com wrote:

  One method is to dispense with PDF and just view the scanned pages online
  as
  images or OCR'd text or point the user to a directory with the scans
  for the document.
  He then only needs an image viewer using a lot less of his machines
 memory.
 
  Large PDF's also cause problems in the viewing computer. I was
  reviewing someones
  25mb PDF the other day and it peaked at 3.3 gig memory use, which on a
  2.5gig
  memory box meant it went into swap and slowed to a crawl.
  The viewer used there was evince.
 
  I scan to jpg and only produce a PDF if nagged
 
 
 http://www.collection.archivist.info/archive/manuals/IS44_Tektronix_602_display_unit/
 
  As I serve from home and the upload is on the slow side individual
  pages helps there too.
  And when in a good mood I finish off a document thus
  http://www.collection.archivist.info/searchv13.php?searchstr=lucas+tp1
  where all pages are web viewable. Been too lazy to write a page to
  page link on the page
  view so far (need a round tuit).
 
  Dave Caroline
 



 --
 Richard Wallis
 Technology Evangelist, Talis
 Tel: +44 (0)7767 886 005

 Linkedin: http://www.linkedin.com/in/richardwallis
 Skype: richard.wallis1
 Twitter: @rjw
 IM: rjw3...@hotmail.com



Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread David Nind
An option to consider is using the Internet Archive's Book Reader:
http://openlibrary.org/dev/docs/bookreader
http://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-your-own-cluster/

You can use it locally with your own images - JPEG 2000, JPEG, TIFF and PNG
are supported formats.  You would probably need to extract the images from
the PDFs if you don't have these already.  You could use something like
ImageMagick to do this.

David Nind

On 5 August 2011 00:45, Cowles, Esme escow...@ucsd.edu wrote:

 I've thought about using JPEG page images instead of PDFs to serve our
 scanned newspapers, which also have sizes ranging upwards of 100MB+, with a
 link to download the PDF as a fallback for people who really want that.  The
 downside is having to do the bulk conversion, manage the extra files, etc.

 Another option would be a flash frontend.  Someone already mentioned
 Google, and I've also seen some use of issuu.com (our campus newspaper
 currently uses them).  There are also options you could integrate into your
 own site, such as FlexPaper (http://flexpaper.devaldi.com/).  You still
 have to upload and/or convert your files, but you retain a PDF-like display
 in the browser.

 -Esme
 --
 Esme Cowles escow...@ucsd.edu

 A person, who is nice to you, but rude to the waiter, is not a nice
 person.
  (This is very important. Pay attention. It never fails.)  -- Dave Barry

 On 08/3/2011, at 7:36 PM, Ranti Junus wrote:

  Dear All,
 
  My colleague came with this query and I hope some of you could give us
 some
  ideas or suggestion:
 
  Our Digital Multimedia Center (DMC) scanning project can produce very
 large
  PDF files. They will have PDFs that are about 25Mb and some may move into
  the 100Mb range. If we provide a link to a PDF of that large, a user may
 not
  want to try to download it even though she really needs to see the
  information. In the past, DMC has created a lower quality, smaller
 versions
  to the original file to reduce the size. Some thoughts have been tossed
  around to reduce the duplication or the work (e.g. no more creating the
  lower quality PDF manually.)
 
  They are wondering if there is an application that we could point to the
 end
  user, who might need it due to poor internet access, that if used will
  simplify the very large file transfer for the end user. Basically:
  - a client software that tells the server to manipulate and reduce the
 file
  on the fly
  - a server app that would to the actual manipulation of the file and then
  deliver it to the end user.
 
  Personally, I'm not really sure about the client software part. It makes
  more sense to me (from the user's perspective) that we provide a
 download
  the smaller size of this large file link that would trigger the
 server-side
  apps to manipulate the big file. However, we're all ears for any
 suggestions
  you might have.
 
 
  thanks,
  ranti.
 
 
  --
  Bulk mail.  Postage paid.




-- 
David Nind
__

david.n...@gmail.com

PO Box 12367
Thorndon
Wellington 6144
+64 4 9720600 (home)
+64 210537847 (cellphone)
+64 4 8906098 (work)


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-04 Thread Eoghan Ó Carragáin
snip
An option to consider is using the Internet Archive's Book Reader:
http://openlibrary.org/dev/docs/bookreader
http://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-your-own-cluster/

You can use it locally with your own images - JPEG 2000, JPEG, TIFF and PNG
are supported formats.  You would probably need to extract the images from
the PDFs if you don't have these already.  You could use something like
ImageMagick to do this.
/snip

I think the brasiliana/Corisco repository has done quite a bit on combining
the OL bookreader with PDFs on the fly:

   -
   https://github.com/brasiliana/Corisco/blob/master/docs/CoriscoDiagram.png
   - https://github.com/brasiliana/Corisco

Eoghan

On 4 August 2011 21:43, David Nind david.n...@gmail.com wrote:

 An option to consider is using the Internet Archive's Book Reader:
 http://openlibrary.org/dev/docs/bookreader

 http://raj.blog.archive.org/2011/03/17/how-to-serve-ia-style-books-from-your-own-cluster/

 You can use it locally with your own images - JPEG 2000, JPEG, TIFF and PNG
 are supported formats.  You would probably need to extract the images from
 the PDFs if you don't have these already.  You could use something like
 ImageMagick to do this.

 David Nind

 On 5 August 2011 00:45, Cowles, Esme escow...@ucsd.edu wrote:

  I've thought about using JPEG page images instead of PDFs to serve our
  scanned newspapers, which also have sizes ranging upwards of 100MB+, with
 a
  link to download the PDF as a fallback for people who really want that.
  The
  downside is having to do the bulk conversion, manage the extra files,
 etc.
 
  Another option would be a flash frontend.  Someone already mentioned
  Google, and I've also seen some use of issuu.com (our campus newspaper
  currently uses them).  There are also options you could integrate into
 your
  own site, such as FlexPaper (http://flexpaper.devaldi.com/).  You still
  have to upload and/or convert your files, but you retain a PDF-like
 display
  in the browser.
 
  -Esme
  --
  Esme Cowles escow...@ucsd.edu
 
  A person, who is nice to you, but rude to the waiter, is not a nice
  person.
   (This is very important. Pay attention. It never fails.)  -- Dave Barry
 
  On 08/3/2011, at 7:36 PM, Ranti Junus wrote:
 
   Dear All,
  
   My colleague came with this query and I hope some of you could give us
  some
   ideas or suggestion:
  
   Our Digital Multimedia Center (DMC) scanning project can produce very
  large
   PDF files. They will have PDFs that are about 25Mb and some may move
 into
   the 100Mb range. If we provide a link to a PDF of that large, a user
 may
  not
   want to try to download it even though she really needs to see the
   information. In the past, DMC has created a lower quality, smaller
  versions
   to the original file to reduce the size. Some thoughts have been tossed
   around to reduce the duplication or the work (e.g. no more creating the
   lower quality PDF manually.)
  
   They are wondering if there is an application that we could point to
 the
  end
   user, who might need it due to poor internet access, that if used will
   simplify the very large file transfer for the end user. Basically:
   - a client software that tells the server to manipulate and reduce the
  file
   on the fly
   - a server app that would to the actual manipulation of the file and
 then
   deliver it to the end user.
  
   Personally, I'm not really sure about the client software part. It
 makes
   more sense to me (from the user's perspective) that we provide a
  download
   the smaller size of this large file link that would trigger the
  server-side
   apps to manipulate the big file. However, we're all ears for any
  suggestions
   you might have.
  
  
   thanks,
   ranti.
  
  
   --
   Bulk mail.  Postage paid.
 



 --
 David Nind
 __

 david.n...@gmail.com

 PO Box 12367
 Thorndon
 Wellington 6144
 +64 4 9720600 (home)
 +64 210537847 (cellphone)
 +64 4 8906098 (work)



[CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-03 Thread Ranti Junus
Dear All,

My colleague came with this query and I hope some of you could give us some
ideas or suggestion:

Our Digital Multimedia Center (DMC) scanning project can produce very large
PDF files. They will have PDFs that are about 25Mb and some may move into
the 100Mb range. If we provide a link to a PDF of that large, a user may not
want to try to download it even though she really needs to see the
information. In the past, DMC has created a lower quality, smaller versions
to the original file to reduce the size. Some thoughts have been tossed
around to reduce the duplication or the work (e.g. no more creating the
lower quality PDF manually.)

They are wondering if there is an application that we could point to the end
user, who might need it due to poor internet access, that if used will
simplify the very large file transfer for the end user. Basically:
- a client software that tells the server to manipulate and reduce the file
on the fly
- a server app that would to the actual manipulation of the file and then
deliver it to the end user.

Personally, I'm not really sure about the client software part. It makes
more sense to me (from the user's perspective) that we provide a download
the smaller size of this large file link that would trigger the server-side
apps to manipulate the big file. However, we're all ears for any suggestions
you might have.


thanks,
ranti.


-- 
Bulk mail.  Postage paid.


Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-03 Thread Gabriel Farrell
I agree that your client software should be nothing more than a link
or button in the web browser. As for the server, it sounds akin to
image servers that resize on the fly. I would probably just proxy
requests to a script or cgi that compresses/converts the files,
especially if you're not planning to get a lot of hits per second. If
that's not robust enough, there are a number of results from a search
for pdf server that might work for you.


On Wed, Aug 3, 2011 at 7:36 PM, Ranti Junus ranti.ju...@gmail.com wrote:
 Dear All,

 My colleague came with this query and I hope some of you could give us some
 ideas or suggestion:

 Our Digital Multimedia Center (DMC) scanning project can produce very large
 PDF files. They will have PDFs that are about 25Mb and some may move into
 the 100Mb range. If we provide a link to a PDF of that large, a user may not
 want to try to download it even though she really needs to see the
 information. In the past, DMC has created a lower quality, smaller versions
 to the original file to reduce the size. Some thoughts have been tossed
 around to reduce the duplication or the work (e.g. no more creating the
 lower quality PDF manually.)

 They are wondering if there is an application that we could point to the end
 user, who might need it due to poor internet access, that if used will
 simplify the very large file transfer for the end user. Basically:
 - a client software that tells the server to manipulate and reduce the file
 on the fly
 - a server app that would to the actual manipulation of the file and then
 deliver it to the end user.

 Personally, I'm not really sure about the client software part. It makes
 more sense to me (from the user's perspective) that we provide a download
 the smaller size of this large file link that would trigger the server-side
 apps to manipulate the big file. However, we're all ears for any suggestions
 you might have.


 thanks,
 ranti.


 --
 Bulk mail.  Postage paid.



Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-03 Thread Joe Hourcle
On Aug 3, 2011, at 7:36 PM, Ranti Junus wrote:

 Dear All,
 
 My colleague came with this query and I hope some of you could give us some
 ideas or suggestion:
 
 Our Digital Multimedia Center (DMC) scanning project can produce very large
 PDF files. They will have PDFs that are about 25Mb and some may move into
 the 100Mb range. If we provide a link to a PDF of that large, a user may not
 want to try to download it even though she really needs to see the
 information. In the past, DMC has created a lower quality, smaller versions
 to the original file to reduce the size. Some thoughts have been tossed
 around to reduce the duplication or the work (e.g. no more creating the
 lower quality PDF manually.)
 
 They are wondering if there is an application that we could point to the end
 user, who might need it due to poor internet access, that if used will
 simplify the very large file transfer for the end user. Basically:
 - a client software that tells the server to manipulate and reduce the file
 on the fly
 - a server app that would to the actual manipulation of the file and then
 deliver it to the end user.
 
 Personally, I'm not really sure about the client software part. It makes
 more sense to me (from the user's perspective) that we provide a download
 the smaller size of this large file link that would trigger the server-side
 apps to manipulate the big file. However, we're all ears for any suggestions
 you might have.


I've been dealing with related issues for a few years, and if you have
the file locally, it's generally not too difficult to have a CGI or similar
that you can call that will do some sort of transformation on the fly.

Unfortunately, what we've run into is that in some cases, in part because
it tends to be used by people with slow connections, and for very large
files, they'll keep restarting to the process, and because it's a generated
on-the-fly, the webserver can't just pick up where it left off, so has to
re-start the process.

The alternative is to write it out to disk, and then let the webserver
handle it as a normal file.  Depending on how many of these you're
dealing with, you may have to have something manage the scratch
space and remove the generated files that haven't been viewed in
some time.

What I've been hoping to do is:

1. Assign URLs to all of the processed forms, of the format:
http://server/processing/ID
(where 'ID' includes some hashing in it, so it's not 10mil 
files in a directory)

2. Write a 404 handler for each processing type, so that
should a file not exist in that directory, it will:
(a) verify that the ID is valid, otherwise, return a 404.
(b) check to see if the ID's being processed, otherwise, kick
off a process for the file to be generated
(c) return a 503 status.

Unfortunately, my initial testing (years ago) suggested that no
clients at the time properly handled 503 requests (effectively,
try back in (x) minutes, and you give 'em a time)

The alternative is to just basically sleep for a period of time, and
then return the file once it's been generated ... but for ones
that take some time (some of my processing might take hours,
as the files that it needs as input are stored near-line, and we're
at the mercy of a tape robot)

...

You might also be able to sleep and then use one of the various
30x status codes, but I don't know what a client might do if you
returned the same URL.  (they might abort, to prevent looping)

-Joe