Re: WC:>: Attachment "Box" (covers MIME)

Michael A. Stone Sun, 27 Sep 1998 18:30:55 -0400
> I would like to set up an email form to allow an attachment to also be
> attached.  Does anyone know how to do this?

basically, it's a MIME issue.

to upload files from a form, you need to use more or less the following:


    <form
        method="POST"
        action="http://www.foo.com/cgi-bin/upload.pl"
        enctype="multipart/form-data"
    >

        <b>Select a File:</b>
        &nbsp;  <input type="file" name="data_file" size="30">

    </form>


the two items of note being the 'enctype="multipart/form-data"' line in
the <form> declaration, and the <input> item of type 'file'.   the first
tells the browser (and indirectly, the server) the the data sent will be
formatted in a different way than is normal.. one which is vastly more
convenient for handling files.   the second is the specific markup which
creates a file upload widget.

the standard encoding method for form data is x-encoded, which is
familiar to anyone who deals with forms regularly:

    input_01=item+1&input_02=item+2


the multipart/form-data encoding is a bit bulkier, transmitting each
input item as a separate 'entity', without encoding the items
themselves:

    -----------------------------19198123266180
    Content-Disposition: form-data; name="input_01"

    item 1
    -----------------------------19198123266180
    Content-Disposition: form-data; name="input_02"

    item 2
    -----------------------------19198123266180--


and putting a randomly-generated separator line between entities.   the
idea behind the separator is to generate a pattern of characters which
is, shall we say, unlikely to appear by accident as part of the data
being uploaded


in my experience, it's best to be a bit stodgy when handling file
uploads on the server side.   it's tempting to read the whole input
stream directly into memory and split it up there, and this technique is
in fact given as an example in various CGI programming books.   it
happens to suck rocks when you're sucking down 30MB graphics files,
though, which is the first practical application i ever had for file
uploads.

in general, it's much easier on the system to write the entire input
stream to a file, then go back and process the pieces from there.   for
the sake of speed, it's best to write the data in blocks of uniform
size, generally a multiple of 4K.   that tends to be the buffer size
most OSes use for file i/o:


    $tmpfile = sprintf ("/tmp/upload_%d.%d", time(), $$);
    open (TMP, "+>$tmpfile") or die qq(can't write to "$tmpfile": $!);

    while (read (STDIN, $block, 4096)) {
        print TMP $block;
    }
    seek (TMP, 0, 0);


in the code above, a unique filename is generated using the current
timestamp and the ID of the process ID of the script.   unix systems
guarantee that no two processes running at the same time can have the
same PID, so that's a quick & easy way to keep multiple versions from
stepping on each others' toes.

the read() function returns zero when there's no more data, and a
positive value otherwise, so it makes an adequate control item for the
while() loop.

the seek() function takes you back to the beginning of the file so you
can step through it and process the data.

it's a whole lot easier to chop up the data this way, in the long run.
reading multipart data from a stream is tricky because there's an
inherent chicken-and-egg scenario to it.   you don't know that you're
done with one part until you start reading the beginning of the next
part, and then you can't back up to start the next part properly.

there are buffering techniques you can use to solve that problem, but
they're subtle.   you need at least two buffers, because there's a
chance you'll get part of the separator string in one read(), and the
rest in the next.   then you have to rotate the buffers to make sure
you're keeping the correct "last chunk", paste the buffers together in
the right order to see if the separator really is in there somewhere,
deal with the possibility of multiple separators in the same 2-buffer
chunk before reading again (because most of the non-file data will be
very small), etc, etc, etc.

all the problems are solvable, but the resulting system is quite
complex and has to be handled carefully.   dumping everything to a file
and making a second pass is less elegant, but a lot more robust.



if you want to write a script which *sends* files as attachments, you
just have to run the same process in reverse.   here's the full text of
a message with an attachment:

    From [EMAIL PROTECTED] Sun Sep 27 17:48:43 1998
    Received: from [208.229.121.27] ([208.229.121.27])
        by gw.yawp.com (8.8.8/8.8.8) with ESMTP id RAA12973
        for <[EMAIL PROTECTED]>; Sun, 27 Sep 1998 17:48:42 -0500 (CDT)
    X-Sender: [EMAIL PROTECTED]
    Message-Id: <l03130301b234714f403a@[208.229.121.27]>
    Mime-Version: 1.0
    Content-Type: multipart/mixed;
        boundary="============_-1305185946==_============"
    Date: Sun, 27 Sep 1998 17:49:07 -0500
    To: [EMAIL PROTECTED]
    From: "Michael A. Stone" <[EMAIL PROTECTED]>

    --============_-1305185946==_============
    Content-Type: text/plain; charset="us-ascii"

    test message, with attachments.

    --============_-1305185946==_============
    Content-Type: text/plain; name="file_01.txt"; charset="us-ascii"
    Content-Disposition: attachment; filename="file_01.txt"

    1 2 3 4 5 6 7 8 9 10

    --============_-1305185946==_============
    Content-Type: text/plain; name="file_02.txt"; charset="us-ascii"
    Content-Disposition: attachment; filename="file_02.txt"

    a b c d e f g h i j k l m n o p q r s t u v w x y z

    --============_-1305185946==_============
    Content-Type: text/plain; charset="us-ascii"




    mike stone  <[EMAIL PROTECTED]>   'net geek..
    been there, done that,  have network, will travel.

    --============_-1305185946==_============--


which has pretty much the same structure as the previous one.   the
separator is different in appearance, but does exactly the same thing.

the items of importance here are:

 -  the "Content-Type" line, which tells the client to expect attached
    data, and defines the separator.   technically, that whole thing should
    be on one line, with a semicolon between the pieces.

 -  the headers for each attachment, showing the type and disposition of
    the entity's data.

 -  the final two dashes after the last item.   those are the termination
    signal which says the message is done.   if you forget those, you'll
    really make the mail client unhappy.


note that the main body of the message and the sig line are in different
places, but don't have explicitly defined dispositions.   that being the
case, they'll both be displayed by the client as part of the message
body.   the two pieces in the middle, which are explicitly defined as
attachments, will be written to disk using the filenames given on the
disposition line.








mike stone  <[EMAIL PROTECTED]>   'net geek..
been there, done that,  have network, will travel.



____________________________________________________________________
--------------------------------------------------------------------
 Join The Web Consultants Association :  Register on our web site Now
Web Consultants Web Site : http://just4u.com/webconsultants
If you lose the instructions All subscription/unsubscribing can be done
directly from our website for all our lists.
---------------------------------------------------------------------
Re: WC:>: Attachment "Box" (covers MIME)

Reply via email to