Hi, I think this is a bug and should be logged however as it is a rather specific use case (with an older version of Nutch), I wonder if you can confirm this with trunk? It would be great to log it against 1.7 (and/or 2.2) so we can work towards a solution.
Best Lewis On Tue, Dec 11, 2012 at 12:45 PM, webdev1977 <webdev1...@gmail.com> wrote: > Using nutch 1.4 and Solr 3.6 > > I see the bug that was submitted for the indexing filter > <https://issues.apache.org/jira/browse/NUTCH-912> not recognizing dates in > the format: > > yyyy-MM-dd'T'HH:mm:ss'Z' but I am still having issues with it. This only > happens with any office documents with the "x" format on the end (docx, > xlsx, pptx, etc). > > Once is makes it way through getTime the result from parsedDate is not in > Zulu (GMT) time, but rather the Local timezone date/time and this is what > gets stored in the solr index for lastModified. > > For example a docx file using the protocol-file plugin has a the following > times: > Actual Date on share drive: December 10 2012 14:47 EST > Last-Modifed http header from protocol-file: 2012-12-10T21:47:00Z > * get error "Unparsable date: 2012-12-10T21:47:00Z * > The final date after getTime(): Monday Dec 10 21:47:00 *EST* 2012 > * Solr lastModified date: 2012-12-11T02:27:00Z * > > This is obviously not correct. I am not sure if it is a problem with the > DateUtils in commons, or something wonky with nutch. > > Any ideas? > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/MoreIndexingFilter-last-modified-time-from-protocol-file-docx-tp4025994.html > Sent from the Nutch - User mailing list archive at Nabble.com. -- Lewis