So that use assumption that tika is always right and that document of
particular mime type should have corresponding extension ? But what
about existence of multiple extensions per mime type ? should I always
get the first one ?

On Mon, Jul 25, 2011 at 1:36 AM, Mark Kerzner <[email protected]> wrote:
> Attach the right extension at the end of the wrong one
>
> On Jul 24, 2011 6:33 PM, "Jakub Liska" <[email protected]> wrote:
>> Hey, I have this decision I can't make, what should one do when user
>> uploads a document with file extension A but the file's detected mime
>> type corresponds to extension B ? In most cases it could yield
>> problems right ? I can't decide on the way of dealing with this.
>>
>> Warn user ? No...
>> Change file name ext to Mime type extension ? Probably yes
>> Do not use any extension ? Can't, documents will be accessible to
>> different users right away
>>
>> What is the proper steps to verify integrity of these documents anyway
>> html,doc,docx,odt,txt,rtf,srt,sub,pdf,odf,odp,xls,ppt ? Or at least
>> for some types
>>
>> I guess that inputStream is always 99,99% read properly from MultiPart
>> request otherwise exception would be thrown and action taken.
>> But user can upload already corrupted file, MS docs, PDF or open
>> document - do I use third party libraries for checking that ? Didn't
>> see anything like that in odftoolkit, itextpdf or pdfbox
>>
>> I just get Media Type
>>
>> protected MediaType getContentType(InputStream is, String
>> httpReqContentType) throws SystemException {
>> MediaType httpReqMediaType = MediaType.parse(httpReqContentType);
>> MediaType mediaType;
>> try {
>> mediaType = MediaType.parse(tika.detect(is));
>> } catch (IOException ioe) {
>> throw new SystemException(ioe.getMessage(), ioe);
>> }
>> if (mediaType.equals(MediaType.OCTET_STREAM) && httpReqMediaType !=
>> null && !httpReqMediaType.equals(MediaType.OCTET_STREAM))
>> return httpReqMediaType;
>> else
>> return mediaType;
>> }
>>
>> Then I check whether it matches one of my supported mime types and
>> then the file is meant to be deliver to a third party customer - which
>> is practically mission critical here.
>>
>> What do you guys do in addition to what I just said for everything to
>> be rock solid ? Can it produce a lot of emails from customers about
>> not getting what they expected ?
>

Reply via email to