Hi, You need to look into source to find out what exactly it does. As far as I know it does not add any new filed into index (it should be done via index-more plugin) but it allows you to query using type: date: and site: I think.
Lukas On 8/9/06, Lourival Júnior <[EMAIL PROTECTED]> wrote: > What does exactilly the query-more plugin? I tested it a few minutes ago and > it dont add any field to the result index. It's used in the webapp? Could > you give me a clarification about it? > > Thanks! > > On 8/9/06, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > If my memory serves me correctly then query-more should work fine with > > 0.7.2 nutch too. > > And you are right Matthew, you need to use both [type:] or [date:] > > filters in combination to [url:] as you can experience empty result > > set if used in solo mode. I do queries like this: [url:http type:pdf] > > and it gives me the result I need. > > > > Lukas > > > > On 8/9/06, Lourival Júnior <[EMAIL PROTECTED]> wrote: > > > All right! I've done this already. I thing you dont understand my > > question. > > > What I want to do is to query my indexes using something like > > > "filetype:pdf". The version 0.8 already have this feature. But I'm using > > the > > > version 0.7.2 and I want to add this feature mannually. But I dont know > > > where I have to edit. Do you know? > > > > > > Regards, > > > > > > Lourival Junior > > > > > > On 8/9/06, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi, > > > > > > > > To allow more formats to be indexed you need to modify nutch-site.xml > > > > and update/add plugin.includes property (see nutch-default.xml for > > > > default settings). The following is what I have in nutch-site.xml: > > > > > > > > <property> > > > > <name>plugin.includes</name> > > > > > > > > > > <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|rtf|html|js|msword|mspowerpoint|msexcel|pdf|zip|rss)|index-(basic|more)|query-(basic|site|url|more)|summary-basic|scoring-opic</value> > > > > </property> > > > > > > > > [parse-*] is used to parse various formats, [query-more] allows you to > > > > use [type:] filter in nutch queries. > > > > > > > > Regards, > > > > Lukas > > > > > > > > On 8/9/06, Lourival Júnior <[EMAIL PROTECTED]> wrote: > > > > > Hi Lukas and everybody! > > > > > > > > > > Do you know which file in nutch 0.7.2 should I edit to add some > > field in > > > > my > > > > > index (i.e. file type - PDF, Word or html)?' > > > > > > > > > > On 8/8/06, Lukas Vlcek <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > I am not sure if I can give you any useful hint but the follwoing > > is > > > > > > what once worked for me. > > > > > > Example of query: url:http date:20060801 > > > > > > > > > > > > date: and type: options can be used in combination with url: > > > > > > Filer url:http should select all documents (unless you allowed > > file, > > > > > > ftp protocols). Plain date ot type filter select onthing if they > > are > > > > > > used alone. > > > > > > > > > > > > And be sure you don't introduce any space between filter name and > > its > > > > > > value ([date: 20060801] is not the same as [date:20060801]) > > > > > > > > > > > > Lukas > > > > > > > > > > > > On 8/8/06, Matthew Holt <[EMAIL PROTECTED]> wrote: > > > > > > > Howie, > > > > > > > I inspected my index using Luke and 20060801 shows up several > > > > times > > > > > > > in the index. I'm unable to query pretty much any field. Several > > > > people > > > > > > > seem to be having the same problem. Does anyone know whats going > > on? > > > > > > > > > > > > > > This is one of the last things I have to resolve to have Nutch > > > > deployed > > > > > > > successfully at my organization. Unfortunately, Friday is my > > last > > > > day. > > > > > > > Can anyone offer any assistance?? > > > > > > > Thanks, > > > > > > > Matt > > > > > > > > > > > > > > Howie Wang wrote: > > > > > > > > I think that I have problems querying for numbers and > > > > > > > > words with digits in them. Now that I think of it, is it > > > > > > > > possible it has something to do with the stemming in > > > > > > > > either the query filter or indexing? In either case, I would > > > > > > > > print out the text that is being indexed and the phrases > > > > > > > > added to the query. You could also using luke to inspect > > > > > > > > your index and see whether 20060801 shows up anywhere. > > > > > > > > > > > > > > > > Howie > > > > > > > > > > > > > > > >> I tried looked for a page that had the date 20060801 and the > > text > > > > > > > >> "test" in the page. I tried the following: > > > > > > > >> > > > > > > > >> date: 20060801 test > > > > > > > >> > > > > > > > >> and > > > > > > > >> > > > > > > > >> date 20060721-20060803 test > > > > > > > >> > > > > > > > >> Neither worked, any ideas?? > > > > > > > >> > > > > > > > >> Matt > > > > > > > >> > > > > > > > >> Matthew Holt wrote: > > > > > > > >>> Thanks Jake, > > > > > > > >>> However, it seems to me that it makes most sense that a > > query > > > > > > > >>> should return all pages that match the query, instead of > > acting > > > > as a > > > > > > > >>> content filter. However, I know its something easy to > > suggest > > > > when > > > > > > > >>> you're not having to implement it, so just a suggestion. > > > > > > > >>> > > > > > > > >>> Matt > > > > > > > >>> > > > > > > > >>> Vanderdray, Jacob wrote: > > > > > > > >>>> Try querying with both the date and something you'd expect > > to > > > > find > > > > > > > >>>> in the content. The field query filter is just a > > filter. It > > > > only > > > > > > > >>>> restricts your results to things that match the basic query > > and > > > > has > > > > > > > >>>> the contents you require in the field. So if you query for > > > > > > > >>>> "date:2006080 text" you'll be searching for documents that > > > > contain > > > > > > > >>>> "text" in one of the default query fields and has the value > > > > 2006080 > > > > > > > >>>> in the date field. Leaving out text in that example would > > > > > > > >>>> essentially be asking for nothing in the default fields and > > > > 2006080 > > > > > > > >>>> in the date field which is why it doesn't return any > > results. > > > > > > > >>>> > > > > > > > >>>> Hope that helps, > > > > > > > >>>> Jake. > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> -----Original Message----- > > > > > > > >>>> From: Matthew Holt [mailto:[EMAIL PROTECTED] > > > > > > > >>>> Sent: Wed 8/2/2006 4:58 PM > > > > > > > >>>> To: [email protected] > > > > > > > >>>> Subject: Querying Fields > > > > > > > >>>> I am unable to query fields in my index in the method that > > has > > > > > > > >>>> been suggested. I used Luke to examine my index and the > > > > following > > > > > > > >>>> field types exist: > > > > > > > >>>> anchor, boost, content, contentLength, date, digest, host, > > > > > > > >>>> lastModified, primaryType, segment, site, subType, title, > > type, > > > > url > > > > > > > >>>> > > > > > > > >>>> However, when I do a search using one of the fields, > > followed > > > > by a > > > > > > > >>>> colon, an incorrect result is returned. I used Luke to find > > the > > > > top > > > > > > > >>>> term in the date field which is '20060801'. I then searched > > > > using > > > > > > > >>>> the following query: > > > > > > > >>>> date: 20060801 > > > > > > > >>>> > > > > > > > >>>> Unfortunately, nothing was returned. The correct plugins > > are > > > > > > > >>>> enabled, here is an excerpt from my nutch-site.xml: > > > > > > > >>>> > > > > > > > >>>> <property> > > > > > > > >>>> <name>plugin.includes</name> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > > > > > > <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|oo|pdf|msword|mspowerpoint|rtf|zip)|index-(basic|more)|query-(more|site|stemmer|url)|summary-basic|scoring-opic</value> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> <description>Regular expression naming plugin directory > > names > > > > to > > > > > > > >>>> include. Any plugin not matching this expression is > > > > excluded. > > > > > > > >>>> In any case you need at least include the > > > > nutch-extensionpoints > > > > > > > >>>> plugin. By > > > > > > > >>>> default Nutch includes crawling just HTML and plain text > > via > > > > > > HTTP, > > > > > > > >>>> and basic indexing and search plugins. > > > > > > > >>>> </description> > > > > > > > >>>> </property> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> Any ideas? I'm not the only one having the same problem, I > > saw > > > > an > > > > > > > >>>> earlier mailing list post but couldn't find any resolve... > > > > Thanks, > > > > > > > >>>> > > > > > > > >>>> Matt > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Lourival Junior > > > > > Universidade Federal do Pará > > > > > Curso de Bacharelado em Sistemas de Informação > > > > > http://www.ufpa.br/cbsi > > > > > Msn: [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Lourival Junior > > > Universidade Federal do Pará > > > Curso de Bacharelado em Sistemas de Informação > > > http://www.ufpa.br/cbsi > > > Msn: [EMAIL PROTECTED] > > > > > > > > > > > > -- > Lourival Junior > Universidade Federal do Pará > Curso de Bacharelado em Sistemas de Informação > http://www.ufpa.br/cbsi > Msn: [EMAIL PROTECTED] > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
