Unfortunately in my case it's not a content repository, but brokerage
service where delivery of corrupted files to customer is "my" fault...
I was seeking for statistics and experience of others ; - )



On Mon, Jul 25, 2011 at 2:00 AM, Mark Kerzner <[email protected]> wrote:
> That is a good first step which you can adjust later based on real stats
>
> On Jul 24, 2011 6:58 PM, "Jakub Liska" <[email protected]> wrote:
>> So that use assumption that tika is always right and that document of
>> particular mime type should have corresponding extension ? But what
>> about existence of multiple extensions per mime type ? should I always
>> get the first one ?
>>
>> On Mon, Jul 25, 2011 at 1:36 AM, Mark Kerzner <[email protected]>
>> wrote:
>>> Attach the right extension at the end of the wrong one
>>>
>>> On Jul 24, 2011 6:33 PM, "Jakub Liska" <[email protected]> wrote:
>>>> Hey, I have this decision I can't make, what should one do when user
>>>> uploads a document with file extension A but the file's detected mime
>>>> type corresponds to extension B ? In most cases it could yield
>>>> problems right ? I can't decide on the way of dealing with this.
>>>>
>>>> Warn user ? No...
>>>> Change file name ext to Mime type extension ? Probably yes
>>>> Do not use any extension ? Can't, documents will be accessible to
>>>> different users right away
>>>>
>>>> What is the proper steps to verify integrity of these documents anyway
>>>> html,doc,docx,odt,txt,rtf,srt,sub,pdf,odf,odp,xls,ppt ? Or at least
>>>> for some types
>>>>
>>>> I guess that inputStream is always 99,99% read properly from MultiPart
>>>> request otherwise exception would be thrown and action taken.
>>>> But user can upload already corrupted file, MS docs, PDF or open
>>>> document - do I use third party libraries for checking that ? Didn't
>>>> see anything like that in odftoolkit, itextpdf or pdfbox
>>>>
>>>> I just get Media Type
>>>>
>>>> protected MediaType getContentType(InputStream is, String
>>>> httpReqContentType) throws SystemException {
>>>> MediaType httpReqMediaType = MediaType.parse(httpReqContentType);
>>>> MediaType mediaType;
>>>> try {
>>>> mediaType = MediaType.parse(tika.detect(is));
>>>> } catch (IOException ioe) {
>>>> throw new SystemException(ioe.getMessage(), ioe);
>>>> }
>>>> if (mediaType.equals(MediaType.OCTET_STREAM) && httpReqMediaType !=
>>>> null && !httpReqMediaType.equals(MediaType.OCTET_STREAM))
>>>> return httpReqMediaType;
>>>> else
>>>> return mediaType;
>>>> }
>>>>
>>>> Then I check whether it matches one of my supported mime types and
>>>> then the file is meant to be deliver to a third party customer - which
>>>> is practically mission critical here.
>>>>
>>>> What do you guys do in addition to what I just said for everything to
>>>> be rock solid ? Can it produce a lot of emails from customers about
>>>> not getting what they expected ?
>>>
>

Reply via email to