Re: QueryParser and BooleanQuery

2012-07-22 Thread Jack Krupansky
Yes, I failed to notice that the removal of the slash was yet another instance of the analyzer transforming its input. But the bottom line is that you must do 100% of the same steps that analysis performs. If in doubt, pass your literals through the standard analyzer itself. -- Jack Krupansky

Re: QueryParser and BooleanQuery

2012-07-22 Thread Deepak Shakya
I tried changing the case to lower case, but still the BooleanQuery doesn't return any documents. I see that the text "/blank" is converted to "blank" in the QueryParser. But in BooleanQuery it remains the same. When I remove the forward slash sign from the input string, I get the matched document

Re: Matching on "owned" docs -- filter or query? Or sort?

2012-07-22 Thread balasubramanian sudaakeran
On the boosting approach, you can have a mandatory field of title match and optional match of userId with very high boost. This would have duplicates but you don't need to do sorting to remove it. Just keep adding the result in the order it comes and if you see that the title is already there in

Usage of NoMergePolicy and its potential implications

2012-07-22 Thread snehal.chennuru
Hello Everyone, We have a legacy system which uses lucene 2.4.1. We have ported a small hack to lucene source code back then, so that the underlying lucene segment merger code wouldn't reuse deleted docids. This helped us use lucene docids as persistent dbids as well. But we want to upgrade lucene

Re: Matching on "owned" docs -- filter or query? Or sort?

2012-07-22 Thread Uncle
Thanks for the reply. I thought of using boosting, for example "((userId:14 AND title:have)^10 OR (title:have))" or "((userId:14^10 AND title:have) OR (title:have))" or something like that. However, there would still be duplicates (all 3 docs for "To Have and To Have Not" would be included whe

Re: Matching on "owned" docs -- filter or query? Or sort?

2012-07-22 Thread Erick Erickson
Hmmm, what about simply boosting very high on owner, and probably grouping on title? If you boosted on owner, you wouldn't even have to index the title separately for each user, your "owner" field could be multivalued and contain _all_ the owner IDs. In that case you wouldn't have to group at all.

Re: QueryParser and BooleanQuery

2012-07-22 Thread Jack Krupansky
The query parser/analyzer is lower-casing the query terms automatically. You have to do the same with with terms for BooleanQuery - Term("cs-method", "GET") should be "Term("cs-method", "get")". StandardAnalyzer is doing the lower-casing. -- Jack Krupansky -Original Message- From: De

QueryParser and BooleanQuery

2012-07-22 Thread Deepak Shakya
Hi, I have following dataset indexed in Lucene. 2010-04-21 02:24:01 GET /blank 200 120 2010-04-21 02:24:01 GET /US/registrationFrame 200 605 2010-04-21 02:24:02 GET /US/kids/boys 200 785 2010-04-21 02:24:02 POST /blank 304 56 2010-04-21 02:24:04 GET /blank 304 233 2010-04-21 02:24:04 GET /blank 50

Re: using phrase query with wildcard

2012-07-22 Thread Jack Krupansky
SpanNearQuery can be used to allow an arbitrary number of terms between sub-phrases of a larger phrase. But, that is between terms, not at the beginning or end of a phrase. See: http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/spans/SpanNearQuery.html You can use SpanMulti

Matching on "owned" docs -- filter or query? Or sort?

2012-07-22 Thread Uncle
I also posted this to StackOverflow, apologies if you see this twice. I have a data set whereby documents are associated to a user id. Say that the documents represent books, and each book can have one or more owner. I am indexing the titles with Lucene. When searching, I want all results owned

[ANNOUNCE] Apache Lucene 3.6.1 released

2012-07-22 Thread Uwe Schindler
22 July 2012, Apache LuceneT 3.6.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.6.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-

RE: using phrase query with wildcard

2012-07-22 Thread Levin, Ilya
It can be both. -Original Message- From: Doron Yaacoby [mailto:dor...@gingersoftware.com] Sent: יום א 22 יולי 2012 11:48 To: java-user@lucene.apache.org Subject: RE: using phrase query with wildcard Is * a placeholder for a term or a part of a term? -Original Message- From: Levi

RE: using phrase query with wildcard

2012-07-22 Thread Doron Yaacoby
Is * a placeholder for a term or a part of a term? -Original Message- From: Levin, Ilya [mailto:ilya.le...@hp.com] Sent: 22 July 2012 11:29 To: java-user@lucene.apache.org Subject: using phrase query with wildcard Hi, I'm trying to create a phrase query with wildcard, from the forums it

using phrase query with wildcard

2012-07-22 Thread Levin, Ilya
Hi, I'm trying to create a phrase query with wildcard, from the forums it seems that the solution is not trivial. I'm trying to create the following queries: "this is a phrase*" OR "*This is a phrase" and Get hits on every possibility where the * resides. What is the best way to achieve this?