Thanks.
I am sorry that I thought the message is not sent and I resend it. :(.
And I am sorry that I did not describe it clearly.
The two item that Doug mentioned is not the source of this problem
because I have already changed MoreIndexingFilter.java as listed
below. So maybe there are something freak in Sort related things. I
will check more deeply and check the SortComparatorSource and
HitCollector for some information.
BTW,
1.fo.getFetchDate() is is more reasonable than get current time and I
will change it.
2.If any documemt did not have the "lastModified" field, the sort
results is totally wrong. Doug, maybe you know why does this happen.
Now, it's only partly wrong.
:)
code listed below:
------
private Document addTime(Document doc, Properties metaData, String url) {
String lastModified = metaData.getProperty("last-modified");
if (lastModified == null)
return doc;
// index/store it as long value
DateFormat df = new SimpleDateFormat("EEE MMM dd HH:mm:ss yyyy zzz");
try {
lastModified = new Long(HttpDateFormat.toLong(lastModified)).toString();
} catch (ParseException e) {
// try to parse it as date in alternative format
try {
Date d = df.parse(lastModified);
lastModified = new Long(d.getTime()).toString();
} catch (Exception e1) {
try{
Date d=new Date();
lastModified = new Long(d.getTime()).toString();
}
catch (Exception ex){
LOG.fine(url+": can't use current time as last-modified");
}
LOG.fine(url+": can't parse erroneous last-modified: "+lastModified);
}
}
if (lastModified != null)
doc.add(Field.Keyword("lastModified", lastModified));
return doc;
}
On 4/21/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Alan Wang wrote:
> > I am trying to sort the search result with "lastModified" field. So I index
> > "lastModified " as Integer and Keyword into index and search with
> > search(Qurey query, Filter filter, int n, Sort sort) method. Just modified
> > in net.nutch.searcher.LuceneQueryOptimizer.optimize.
> > return searcher.search(query, filter, numHits,
> >
> > new Sort(
> > new SortField[]{
> > new SortField("lastModified", SortField.INT, true)
> > }
> > ));
> >
> > The result sure changed, and largely sorted by time. But it didn't exactly
> > sorted by lastModified. The results looks ugly, :(.
>
> I can see two sources of problems:
>
> 1. You should sort by the "date" field, not "lastModified", since that's
> not indexed, and sorting requires an indexed field.
>
> 2. Not all pages have a lastModified value. You should change
> MoreIndexingFilter to always add a date. If no last modified is
> specified, then use the fetch date, fo.getFetchDate().
>
> If you get this working, please send a patch. Even if it's a hack, it's
> a start for others.
>
> Thanks,
>
> Doug
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: New Crystal Reports XI.
> Version 11 adds new functionality designed to reduce time involved in
> creating, integrating, and deploying reporting solutions. Free runtime info,
> new features, or free trial, at: http://www.businessobjects.com/devxi/728
> _______________________________________________
> Nutch-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
--
Regards,
Alan Wang
-------------------------------------------------------
This SF.Net email is sponsored by: New Crystal Reports XI.
Version 11 adds new functionality designed to reduce time involved in
creating, integrating, and deploying reporting solutions. Free runtime info,
new features, or free trial, at: http://www.businessobjects.com/devxi/728
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers