Hi,
On 9/26/07, kbennett <[EMAIL PROTECTED]> wrote:
> 1) While you are modifying the Parser class, could we change getContents()
> to not swallow exceptions?
> [...]
> and modify the method declaration to throw a TikaException?
Sure, that makes sense.
> 2) In ParserFactory, we have:
>
> } catch (Exception e) {
> logger.error("Unable to instantiate parser: " + className, e);
> throw new TikaException(e.getMessage());
> }
>
> When we adapt an exception to a TikaException, would it make sense to wrap
> the entire exception, and not just its getMessage()?
+1
> 3) In Parser.getContents(), we could use Commons Lang StringUtils to make
> the code more nullsafe and a bit more concise by replacing:
>
> int length = Math.min(contentStr.length(), 500);
> String summary = contentStr.substring(0, length);
>
> --- with: ---
>
> String summary = StringUtils.left(contentStr, 500);
-1 I'm not sure if that's worth the extra dependency to commons-lang.
> It's too bad we can't have a custom object...then we could have a
> getSummary() method that would do this so we don't run the risk of the
> summary getting out of sync with respect to the fulltext content.
I don't think we have any cases where the fulltext or summary would
change after parsing.
> Same for getValue() always being getValues().get(0).
Good idea, though I'd really like to replace the whole Content object
stuff with a different metadata mechanism.
BR,
Jukka Zitting