Re: Tool for analyzing analyzers
Hi Erik, I've had this running OK from the command line and in Eclipse on XP. I suspect it might be because you're running a different OS? The Classfinder tries to split the system property java.class.path on the ; character but I forgot different OSes have different seperators. As for Luke etc - I had a vague notion that this could be extended into a more generalised workbench for Lucene that could also help with indexing. Using a plug-in architecture (once we get classloading sorted!) you could define interfaces for things such as fetchers (db/file/web) and parsers (PDF/Word..) and configure them to create indexes using a GUI like this, or a web-based interface. People could then contribute plug-in implementations as Jars that you could just drop in to the workbench. Let me know your setup details and I'll try fix the classloader issue. Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Tool for analyzing analyzers
Hi Mark, I've had this running OK from the command line and in Eclipse on XP. I suspect it might be because you're running a different OS? The Classfinder tries to split the system property java.class.path on the ; character but I forgot different OSes have different seperators. Let me know your setup details and I'll try fix the classloader issue. I have the same problems and am running on linux using ':' to separate the class path... BTW: I tried to compile your sources but you left out the part in thinlet. 2928 Sun Oct 12 19:47:56 CEST 2003 thinlet/AppletLauncher.class 2643 Sun Oct 12 19:47:56 CEST 2003 thinlet/FrameLauncher.class 74823 Sun Oct 12 19:47:56 CEST 2003 thinlet/Thinlet.class Was that intentional? Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Range Query Sombody HELP please
Karthik, On Friday 28 May 2004 05:54, Karthik N S wrote: ... Weh we do a search in SQL using '*' we all know that the result would be total no of records in the table,but when we want to get limit our record we apply range between 2 specific row records [Which we call it as subsearch] Similarly on a indexed record I would like perform the same tecnique as above. In case you need to reuse the limitation a filter is the way to go in Lucene. However it seems to be better to get the range query working first. In fact I was looking at the url u sent me in the last mail on using getRange Queries and was working on the same http://jakarta.apache.org/lucene/docs/queryparsersyntax.html The query I gave uses two +'s prefixed to the query parts: +search_word +(book:[100 TO 200]) Both query parts are required because of the +'s, ie. it works as the AND operator in SQL. The TO operator queries the range in the book field. and http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html but witou results for the last 12 hrs. You have probably seen a lot of different things that will be useful later. If u could spare a few minuts and please expalin or provide a simple [ full ] example using and over riding the getRange() method . The problem you'll probably run into is that Lucene does not support numbers directly, you'll have to index them as strings, eg. by prefixing zero's: As Erik indicated: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields You may have to reindex your data for this. In case you have a lot of data consider setting up a test first. Then in the getRangeQuery() method of your parser you'll need to prefix the queried numbers in the same way. The example in the article is about date fields, but the adaptation to numbers shouldn't be a problem. When you override this in your query parser: getRangeQuery(String field, Analyzer analyzer, String start, String end, boolean inclusive) it will be called for the example query with start = 100 and end = 200. (See http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html under Customizing query parser). In the overriding method you can then call the super method with the start and end prefixed with zero's as indicated in searching numerical fields referred to above. Have fun, you'll get it working, Ype with regards Karthik -Original Message- From: Ype Kingma [mailto:[EMAIL PROTECTED] Sent: Thursday, May 27, 2004 11:03 PM To: [EMAIL PROTECTED] Subject: Re: Range Query Sombody HELP please On Thursday 27 May 2004 09:37, Karthik N S wrote: Hi Lucene -Developer My main intention was Search for an word hit in a Unique Field between ranges say book100 - book 200 indexed numbers It's something like creating a SUBSEARCH with in the SEARCHINDEX. ... Could you explain what you mean by subsearch? I suppose you might want to have a look at the various filter classes in the org.apache.lucene.search package. Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Range Query Sombody HELP please
Hey ype Thx for the advice but still I need to get the exact situation working , 1) I have a unique Field [ called filename ] which is indexed of type Text. It accepts the name of the HTML files as the indexing parameter , Also there is another Field called Contents which stores all the contents of that indicated unique named html file. 2) The indexer complete indexes for about 5000 html files sucessfully . 3) When I do a search for word ,it returns a hit of 400 on various html files Now in this situation if I want to limit the hits between First 200 to 400 html Page Names only what exactly should I do to using getRange() method. Please advise on how to proceed ... with regards Karthik -Original Message- From: Ype Kingma [mailto:[EMAIL PROTECTED] Sent: Friday, May 28, 2004 1:14 PM To: [EMAIL PROTECTED] Subject: Re: Range Query Sombody HELP please Karthik, On Friday 28 May 2004 05:54, Karthik N S wrote: ... Weh we do a search in SQL using '*' we all know that the result would be total no of records in the table,but when we want to get limit our record we apply range between 2 specific row records [Which we call it as subsearch] Similarly on a indexed record I would like perform the same tecnique as above. In case you need to reuse the limitation a filter is the way to go in Lucene. However it seems to be better to get the range query working first. In fact I was looking at the url u sent me in the last mail on using getRange Queries and was working on the same http://jakarta.apache.org/lucene/docs/queryparsersyntax.html The query I gave uses two +'s prefixed to the query parts: +search_word +(book:[100 TO 200]) Both query parts are required because of the +'s, ie. it works as the AND operator in SQL. The TO operator queries the range in the book field. and http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html but witou results for the last 12 hrs. You have probably seen a lot of different things that will be useful later. If u could spare a few minuts and please expalin or provide a simple [ full ] example using and over riding the getRange() method . The problem you'll probably run into is that Lucene does not support numbers directly, you'll have to index them as strings, eg. by prefixing zero's: As Erik indicated: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields You may have to reindex your data for this. In case you have a lot of data consider setting up a test first. Then in the getRangeQuery() method of your parser you'll need to prefix the queried numbers in the same way. The example in the article is about date fields, but the adaptation to numbers shouldn't be a problem. When you override this in your query parser: getRangeQuery(String field, Analyzer analyzer, String start, String end, boolean inclusive) it will be called for the example query with start = 100 and end = 200. (See http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html under Customizing query parser). In the overriding method you can then call the super method with the start and end prefixed with zero's as indicated in searching numerical fields referred to above. Have fun, you'll get it working, Ype with regards Karthik -Original Message- From: Ype Kingma [mailto:[EMAIL PROTECTED] Sent: Thursday, May 27, 2004 11:03 PM To: [EMAIL PROTECTED] Subject: Re: Range Query Sombody HELP please On Thursday 27 May 2004 09:37, Karthik N S wrote: Hi Lucene -Developer My main intention was Search for an word hit in a Unique Field between ranges say book100 - book 200 indexed numbers It's something like creating a SUBSEARCH with in the SEARCHINDEX. ... Could you explain what you mean by subsearch? I suppose you might want to have a look at the various filter classes in the org.apache.lucene.search package. Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Range Query Sombody HELP please
On May 28, 2004, at 4:54 AM, Karthik N S wrote: 1) I have a unique Field [ called filename ] which is indexed of type Text. You probably do not want to use Field.Text for a filename. Use Field.Keyword instead. 2) The indexer complete indexes for about 5000 html files sucessfully . Now use Luke (Google for _luke lucene_) to browse your index, and check that you are getting what you think. You can do ad-hoc queries there also. Now in this situation if I want to limit the hits between First 200 to 400 html Page Names only what exactly should I do to using getRange() method. If you want the first 200 - 400, start your Hits walking at index 200, and proceed through 400. Is there some field you want to key off to do the range? Or do you just want the 200th - 400th hits from the search, which is an entirely different question than about ranges. Please advise on how to proceed ... Please send (succinct) code examples in the future to really keep this discussion concrete and clear. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Tool for analyzing analyzers
On May 28, 2004, at 2:46 AM, [EMAIL PROTECTED] wrote: Hi Erik, I've had this running OK from the command line and in Eclipse on XP. I suspect it might be because you're running a different OS? The Classfinder tries to split the system property java.class.path on the ; character but I forgot different OSes have different seperators. There is another OS other than Mac OS X? :) There is a File constant that gives you the OS-specific separator. File.pathSeparatorChar. Using a plug-in architecture (once we get classloading sorted!) you could define interfaces for things such as fetchers (db/file/web) and parsers (PDF/Word..) and configure them to create indexes using a GUI like this, or a web-based interface. People could then contribute plug-in implementations as Jars that you could just drop in to the workbench. Sounds like we'd be re-inventing Nutch :) But I'd love to build a Lucene demo application that is powerful enough to be used as a foundation for folks to use out-of-the-box. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Tool for analyzing analyzers
Hi Erik, Erik Hatcher wrote: [snip] But I'd love to build a Lucene demo application that is powerful enough to be used as a foundation for folks to use out-of-the-box. That's just what I thought. Here's one: http://www.zilverline.org Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Cheers, Michael Franken - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Range Query Sombody HELP please
On Friday 28 May 2004 10:54, Karthik N S wrote: Hey ype Thx for the advice but still I need to get the exact situation working , 1) I have a unique Field [ called filename ] which is indexed of type Text. It accepts the name of the HTML files as the indexing parameter , Also there is another Field called Contents which stores all the contents of that indicated unique named html file. 2) The indexer complete indexes for about 5000 html files sucessfully . 3) When I do a search for word ,it returns a hit of 400 on various html files Now in this situation if I want to limit the hits between First 200 to 400 html Page Names only what exactly should I do to using getRange() method. A range query will provide a range of indexed values, and I thought you needed to add the record number as an indexed field in each record. However, you seem to use the 200 and 400 here as the order number for each record in the result of the query on the Contents field. Is that correct? When so, in which order do you expect the results of your query? Kind regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Exact Field Match
Hi, Does Lucene have support for exact field match? Is there a way to say that this field equals exactly this value? I know I can do it by using an untokenized field. But I have some values that I would want to store in both tokenized and untokenized copies of the same field. Instead of doing that I'm just storing the tokenized version. For example: MyField = My value. I want to search where My value. is the exact match for this field but I also sometime want to do a containing search so that just a query for value matches. I'm planning on extracting the stored value and comparing it to see if its an exact match. If you have a better idea please send it my way! Thanks, Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Exact Field Match
Yes, you can. And others probably have a much better example than mine... There is probably a wiki or other document describing it. You can chain queries together with BooleanQuery. I am creating a Vector of Query's based on restriction criteria off my site and then loading them into the BooleanQuery. You might check out WildCardQuery which works well with/without wildcard parameters inside it. -Gus QueryParser qp = new QueryParser(contents,analyzer); qp.setOperator(DEFAULT_OPERATOR); Query query = qp.parse(queryline); if(vFilters != null vFilters.size() 0){ BooleanQuery bq = new BooleanQuery(); bq.add(query,true/*required*/,false/*not prohibited*/); Enumeration enum = vFilters.elements(); while(enum.hasMoreElements()){ bq.add( (Query) enum.nextElement(),true/*required*/,false/*not prohibited*/); } hits = searcher.search(bq); }else{ hits = searcher.search(query); } ... public void setFilter(String fieldname,String fieldvalue){ if(fieldname != null fieldvalue != null fieldname.length() 0 fieldvalue.length() 0){ if(fieldvalue.indexOf(?) == -1){ fieldvalue += ?; } Term t = new Term(fieldname,fieldvalue); WildcardQuery tq = new WildcardQuery(t); Filter myfilter = new QueryFilter(tq); setFilter(filter); vFilters.addElement(tq); } } -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, May 28, 2004 4:13 PM To: [EMAIL PROTECTED] Subject: Exact Field Match Hi, Does Lucene have support for exact field match? Is there a way to say that this field equals exactly this value? I know I can do it by using an untokenized field. But I have some values that I would want to store in both tokenized and untokenized copies of the same field. Instead of doing that I'm just storing the tokenized version. For example: MyField = My value. I want to search where My value. is the exact match for this field but I also sometime want to do a containing search so that just a query for value matches. I'm planning on extracting the stored value and comparing it to see if its an exact match. If you have a better idea please send it my way! Thanks, Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]