Hi,

I think this is a bug and should be logged however as it is a rather
specific use case (with an older version of Nutch), I wonder if you
can confirm this with trunk? It would be great to log it against 1.7
(and/or 2.2) so we can work towards a solution.

Best

Lewis

On Tue, Dec 11, 2012 at 12:45 PM, webdev1977 <webdev1...@gmail.com> wrote:
> Using nutch 1.4 and Solr 3.6
>
> I see the bug that was submitted for the  indexing filter
> <https://issues.apache.org/jira/browse/NUTCH-912>   not recognizing dates in
> the format:
>
> yyyy-MM-dd'T'HH:mm:ss'Z' but I am still having issues with it.  This only
> happens with any office documents with the "x" format on the end (docx,
> xlsx, pptx, etc).
>
> Once is makes it way through getTime the result from parsedDate is not in
> Zulu (GMT) time, but rather the Local timezone date/time and this is what
> gets stored in the solr index for lastModified.
>
> For example a docx file using the protocol-file plugin has a the following
> times:
>    Actual Date on share drive: December 10 2012 14:47 EST
>    Last-Modifed http header from protocol-file:  2012-12-10T21:47:00Z
>     * get error "Unparsable date: 2012-12-10T21:47:00Z *
>    The final date after getTime(): Monday Dec 10 21:47:00 *EST* 2012
>  *  Solr lastModified date: 2012-12-11T02:27:00Z   *
>
> This is obviously not correct.  I am not sure if it is a problem with the
> DateUtils in commons, or something wonky with nutch.
>
> Any ideas?
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/MoreIndexingFilter-last-modified-time-from-protocol-file-docx-tp4025994.html
> Sent from the Nutch - User mailing list archive at Nabble.com.



-- 
Lewis

Reply via email to