That' s good,thanks

2005/4/21, Alan Wang <[EMAIL PROTECTED]>: 
> 
> Thanks.
> 
> I am sorry that I thought the message is not sent and I resend it. :(.
> And I am sorry that I did not describe it clearly.
> 
> The two item that Doug mentioned is not the source of this problem
> because I have already changed MoreIndexingFilter.java as listed
> below. So maybe there are something freak in Sort related things. I
> will check more deeply and check the SortComparatorSource and
> HitCollector for some information.
> 
> BTW,
> 1.fo.getFetchDate() is is more reasonable than get current time and I
> will change it.
> 2.If any documemt did not have the "lastModified" field, the sort
> results is totally wrong. Doug, maybe you know why does this happen.
> Now, it's only partly wrong.
> :)
> 
> code listed below:
> ------
> private Document addTime(Document doc, Properties metaData, String url) {
> 
> String lastModified = metaData.getProperty("last-modified");
> if (lastModified == null)
> return doc;
> 
> // index/store it as long value
> DateFormat df = new SimpleDateFormat("EEE MMM dd HH:mm:ss yyyy zzz");
> try {
> lastModified = new Long(HttpDateFormat.toLong(lastModified)).toString();
> } catch (ParseException e) {
> // try to parse it as date in alternative format
> try {
> Date d = df.parse(lastModified);
> lastModified = new Long(d.getTime()).toString();
> } catch (Exception e1) {
> try{
> Date d=new Date();
> lastModified = new Long(d.getTime()).toString();
> }
> catch (Exception ex){
> LOG.fine(url+": can't use current time as last-modified");
> }
> LOG.fine(url+": can't parse erroneous last-modified: "+lastModified);
> 
> }
> }
> 
> if (lastModified != null)
> doc.add(Field.Keyword("lastModified", lastModified));
> 
> return doc;
> }
> 
> On 4/21/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> > Alan Wang wrote:
> > > I am trying to sort the search result with "lastModified" field. So I 
> index
> > > "lastModified " as Integer and Keyword into index and search with
> > > search(Qurey query, Filter filter, int n, Sort sort) method. Just 
> modified
> > > in net.nutch.searcher.LuceneQueryOptimizer.optimize.
> > > return searcher.search(query, filter, numHits,
> > >
> > > new Sort(
> > > new SortField[]{
> > > new SortField("lastModified", SortField.INT <http://SortField.INT>, 
> true)
> > > }
> > > ));
> > >
> > > The result sure changed, and largely sorted by time. But it didn't 
> exactly
> > > sorted by lastModified. The results looks ugly, :(.
> >
> > I can see two sources of problems:
> >
> > 1. You should sort by the "date" field, not "lastModified", since that's
> > not indexed, and sorting requires an indexed field.
> >
> > 2. Not all pages have a lastModified value. You should change
> > MoreIndexingFilter to always add a date. If no last modified is
> > specified, then use the fetch date, fo.getFetchDate().
> >
> > If you get this working, please send a patch. Even if it's a hack, it's
> > a start for others.
> >
> > Thanks,
> >
> > Doug
> >
> > -------------------------------------------------------
> > This SF.Net <http://SF.Net> email is sponsored by: New Crystal Reports 
> XI.
> > Version 11 adds new functionality designed to reduce time involved in
> > creating, integrating, and deploying reporting solutions. Free runtime 
> info,
> > new features, or free trial, at: 
> http://www.businessobjects.com/devxi/728
> > _______________________________________________
> > Nutch-developers mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nutch-developers
> >
> 
> --
> Regards,
> Alan Wang
> 



-- 
TEL 0512-68251233-6966
MSN:[EMAIL PROTECTED]
Mail:[EMAIL PROTECTED]
QQ:58624951
BenQ.com <http://BenQ.com>
268 Shishan Road, New District, 
Suzhou, China

Reply via email to