how to approach phrase queries and term grouping

2011-06-23 Thread Jason Guild
Hi All: I am new to Lucene and my project is to provide specialized search for a set of booklets. I am using Lucene Java 3.1. The basic idea is to run queries to find out what booklet and page numbers are match in order to help people know where to look for information in the (rather large

field sorted searches with unbounded hit count

2011-06-23 Thread Tim Eck
For the searches I want to run on my index I want to return all matching documents (as opposed to N top hits). My first naïve approach was just to use Searcher.search(query, filter, Integer.MAX_VALUE, sort) – that is, pass Integer.MAX_VALUE for the number of possible docs to return. That

Re: Suggestion: make some more TokenFilters KeywordAttribute aware

2011-06-23 Thread Simon Willnauer
On Wed, Jun 22, 2011 at 8:53 PM, Sujit Pal s...@healthline.com wrote: Hello, I am currently in need of a LowerCaseFilter and StopFilter that will recognize KeywordAttribute, similar to the way PorterStemFilter currently does (on trunk). Specifically, in case the term is a

Re: questions about searching lucene 3.2

2011-06-23 Thread Simon Willnauer
As far as I understand you have 2 different problems. 1. search and 2.4 index with 3.2 code using standard analyzer. in this case you should either reindex or pass Version.LUCENE_24 to the StandardAnalyzer ctor that should help here. 2. search a string with parentheses with the query parser you

Re: Lucene Searching

2011-06-23 Thread Pranav goyal
I tried it and it worked, although it's having one peculiarity. When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it gives me 0 hits. What mistake am I doing here? Also when I search for *341* it is giving me correct results i.e 0341-000-000-DR but it's not working for

Re: ComplexPhraseQueryParser with multiple fields

2011-06-23 Thread lichman
Which patch are you referring to? The last one? And sure... I'll do the voting thing. -- View this message in context: http://lucene.472066.n3.nabble.com/ComplexPhraseQueryParser-with-multiple-fields-tp2879290p3099032.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Lucene Searching

2011-06-23 Thread Ian Lea
What exactly is it? Show us what you are indexing, how, and how you are building the query and we may be able to help. Whenever I see a report of incorrect results on a Mixed Case field I always suspect that the term is being lowercased on indexing and not at searching, or vice versa. -- Ian.

Re: Lucene Searching

2011-06-23 Thread Pranav goyal
Here's the code which I am implementing (Indexing and Searching codes are in different files) Indexing Part : d=new Document(); File indexDir = new File(index-dir); KeywordAnalyzer analyzer = new KeywordAnalyzer(); IndexWriterConfig conf = new

Re: Lucene Searching

2011-06-23 Thread Ian Lea
Looks OK to me. You are searching on Item without adding any docs with that field, you could use writer.updateDocument() rather than delete and add, but those are just quibbles and don't explain your searching problem. Having done most of the hard work, why don't you adapt the code you posted

Re: Lucene Searching

2011-06-23 Thread digy digy
Maybe, you need queryParser.setLowercaseExpandedTerms(false) DIGY On Thu, Jun 23, 2011 at 9:37 AM, Pranav goyal pranavgoyal40...@gmail.comwrote: I tried it and it worked, although it's having one peculiarity. When I search for Item_1 : it gives me 110 hits but when I use *Item_1* it gives

Re: how to approach phrase queries and term grouping

2011-06-23 Thread Ian Lea
Have you read Lucene In Action 2nd edition? Highly recommended for anyone new to lucene and includes info and code on synonyms and position increments. The code is available somewhere as a free download. You may also want to read up on slop and span queries. See for example

Re: IndexWriter.optimize not using it breaks my test case :(

2011-06-23 Thread Ian Lea
From the 3.2.0 javadocs: Optimize is a fairly costly operation, so you should only do it if your search performance really requires it. Many search applications do fine never calling optimize. See the FAQ and javadocs on searchers and writers for thread safety info. One thing that optimize

Re: field sorted searches with unbounded hit count

2011-06-23 Thread Ian Lea
One possibility would be to execute the search first just to get the number of hits - see TotalHitCountCollector in recent versions of lucene, not sure when it was added - and use the hit count from that as the max docs to return. The counting only search would typically be very quick, certainly

Re: ComplexPhraseQueryParser with multiple fields

2011-06-23 Thread lichman
By the way - I'm using the ComplexPhraseQueryParser that I've downloaded from: https://issues.apache.org/jira/browse/SOLR-1604 And I've tried to use packages: - org.apache.lucene.search - org.apache.lucene.queryParser Both, when compiled and added to the SOLR lib dir, caused the exception.

Re: ComplexPhraseQueryParser with multiple fields

2011-06-23 Thread Ahmet Arslan
By the way - I'm using the ComplexPhraseQueryParser that I've downloaded from: https://issues.apache.org/jira/browse/SOLR-1604 And I've tried to use packages: - org.apache.lucene.search - org.apache.lucene.queryParser Both, when compiled and added to the SOLR lib dir, caused the

Re: ComplexPhraseQueryParser with multiple fields

2011-06-23 Thread lichman
Thanks! Now it works. But now there's another issue. I'm using SOLR and Lucene 3.1.0 and when sending a query Wildcard* phrase* it works as expected - but, when sending the query wildcard* (Only one word withing the phrase) I'm getting another exception: HTTP ERROR: 500 Unknown query type

Re: ComplexPhraseQueryParser with multiple fields

2011-06-23 Thread Ahmet Arslan
But now there's another issue. I'm using SOLR and Lucene 3.1.0 and when sending a query Wildcard* phrase* it works as expected - but, when sending the query wildcard* (Only one word withing the phrase) I'm getting another exception: HTTP ERROR: 500 Unknown query type

Re: ComplexPhraseQueryParser with multiple fields

2011-06-23 Thread lichman
The same as touch*. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/ComplexPhraseQueryParser-with-multiple-fields-tp2879290p3099824.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

RE: questions about searching lucene 3.2

2011-06-23 Thread Bob Rhodes
Yeah I agree that this is the issue. I did get my query to work using the ClassicAnalyzer. I guess maybe I need to upgrade my indexes which will be a big job. Any advice here is appreciated. I didn't have any luck passing Version.LUCENE_24 to the StandardAnalyzer. There query still didn't work.

Search multiple directories simultaneously

2011-06-23 Thread Cheng
Hi, I have multiple indexed folders (or directories), each holding indexing files for specific purposes. I want to do a search over these folders (or directories) in a same query. Is it possible? Thanks

SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordMarkerFilterFactory'

2011-06-23 Thread abhayd
hi we upgraded to solr 1.4. We are getting error SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordMarkerFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at

RE: Search multiple directories simultaneously

2011-06-23 Thread Uwe Schindler
IndexReader index1 = IndexReader.open(dir1); IndexReader index2 = IndexReader.open(dir2); IndexReader index3 = IndexReader.open(dir3); ... IndexReader all = new MultiReader(index1, index2, index3,...); IndexSearcher searcher = new IndexSearcher(all); ...search your indexes... all.close();

RE: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordMarkerFilterFactory'

2011-06-23 Thread Uwe Schindler
Solr 1.4 does not have this class nor it references it. Are you sure you not have added some Lucene/Solr 3.1 or 3.2 JAR files somewhere in your classpath? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From:

Re: Search multiple directories simultaneously

2011-06-23 Thread Cheng
thanks man. very condense and easy to follow. can i ask how the multiple search will impact the performance? i have probably 50GB data in each of the 10-20 folders. On Fri, Jun 24, 2011 at 1:04 AM, Uwe Schindler u...@thetaphi.de wrote: IndexReader index1 = IndexReader.open(dir1); IndexReader

Computing document frequencies for specific queries in Lucene

2011-06-23 Thread aengle1429
Hello, I currently am trying to get the following results... let's say I have 3 XML files that I parse using SAX: ?xml version=1.0 encoding=UTF-8? person namebob bob bob /name name3m /name height3m /height heightbob /height /person ?xml version=1.0 encoding=UTF-8? person

RE: field sorted searches with unbounded hit count

2011-06-23 Thread Tim Eck
Thanks for the idea Ian. I still need to think about it, but the race between running the total count search and then the sorted search worries me. I have very pretty specific visibility guarantees I must provide on this data (with respect to concurrent updates). It'd be a bummer to have to

spaces in the field name

2011-06-23 Thread Nilesh Vijaywargiay
I have a situation where the field name consists of spaces. So a query like *short text: value* doesn't return any results as the query structure internally would be *defaultField:short text:value* * * *Any work around for including spaces in your field name?* * * *Nilesh* * *

Re: Suggestion: make some more TokenFilters KeywordAttribute aware

2011-06-23 Thread Sujit Pal
Thanks Simon, I have opened a JIRA and attached a patch. I have verified that I haven't broken anything, and I have used these patched files to test in my local application and have verified that they work. https://issues.apache.org/jira/browse/LUCENE-3236 -sujit On Thu, 2011-06-23 at 08:21

RE: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordMarkerFilterFactory'

2011-06-23 Thread abhayd
thanks. Our schema has fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter

Does {Filter}ing is faster than {Query}ing in Lucene?

2011-06-23 Thread Denis Bazhenov
While reading Lucene in Action 2nd edition I came across the description of Filter classes which are could be used for result filtering in Lucene. Lucene has a lot of filters repeating Query classes. For example, NumericRangeQuery and NumericRangeFilter. The book says that NRF does exactly the

Lucene sort performance roots?

2011-06-23 Thread Denis Bazhenov
Well, maybe it's a bit controversial question, but anyway... Lucene is a great toolkit for search applications. And it's so fast in most of cases. I think I am understand why it's faster than relational databases for information retrieval. For example, Lucene use very efficient index than

Re: questions about searching lucene 3.2

2011-06-23 Thread Simon Willnauer
On Thu, Jun 23, 2011 at 3:46 PM, Bob Rhodes bob.rho...@trssllc.com wrote: Yeah I agree that this is the issue. I did get my query to work using the ClassicAnalyzer. I guess maybe I need to upgrade my indexes which will be a big job. Any advice here is appreciated. I didn't have any luck

Re: field sorted searches with unbounded hit count

2011-06-23 Thread Simon Willnauer
On Thu, Jun 23, 2011 at 10:41 PM, Tim Eck tim...@gmail.com wrote: Thanks for the idea Ian. I still need to think about it, but the race between running the total count search and then the sorted search worries me. I have very pretty specific visibility guarantees I must provide on this data

Re: Lucene sort performance roots?

2011-06-23 Thread Dawid Weiss
Can you describe the kind of sorting you're doing? Maybe the data is already sorted (and in RAM) and you're only getting it out? Dawid On Fri, Jun 24, 2011 at 3:32 AM, Denis Bazhenov dot...@gmail.com wrote: Well, maybe it's a bit controversial question, but anyway... Lucene is a great toolkit