date:20100208

Lucene document get values

2010-02-08 Thread nithin kamath


Hi,

I have a lucene document which has a field which appears repeatedly in the
document, I use doc.getFieldables(fieldName) to get the field values; when
the number of fields become huge; getting the field values is taking up a
lot of memory, is there some other way that I could get the field values
probably in a more efficient way.

thanks & regards
nithin
-- 
View this message in context: 
http://old.nabble.com/Lucene-document-get-values-tp27496756p27496756.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Best Practice 3.0.0

2010-02-08 Thread NanoE


Hello,

I am writing small library search and want to know what are the best
practice for lucene 3.0.0 for almost real time index update?

Thanks Nano
-- 
View this message in context: 
http://old.nabble.com/Best-Practice-3.0.0-tp27496796p27496796.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Problems with IndexWriter#commit() on Linux

2010-02-08 Thread Michael McCandless

Thanks for sharing...

Software RAID should be perfectly fine for Lucene, in general, unless
the mount is configured to ignore fsync (I think the "data=writeback"
mount option for ext3 does so on Linux).

Can you check the mount options on your RAID filesystem?

Mike

On Mon, Feb 8, 2010 at 2:09 AM, Naama Kraus  wrote:
> Hi All,
>
> I am back to this one after some while.
> It appears the file system I was using resides on software RAID disks. I ran
> the same code on the same Linux machine, but on another file system residing
> on SCSI disks. I didn't observe the problem there.
> Both file systems are ext3.
> So I am guessing the problem relates to the RAID disks.
>
> I looked again at commit() API, and the following comment may be explaining:
>
> "Note that this operation calls Directory.sync on the index files. That call
> should not return until the file contents & metadata are on stable storage.
> For FSDirectory, this calls the OS's fsync. But, beware: some hardware
> devices may in fact cache writes even during fsync, and return before the
> bits are actually on stable storage, to give the appearance of faster
> performance. If you have such a device, and it does not have a battery
> backup (for example) then on power loss it may still lose data. Lucene
> cannot guarantee consistency on such devices."
>
> Well, for me, running on the SCSI disks is just fine, I wanted to anyway
> share my experience.
>
> Naama
>
> On Fri, Jan 8, 2010 at 12:09 AM, Naama Kraus  wrote:
>
>> Thanks all for the hints, I'll get back to my code and do some additional
>> checks.
>> Naama
>>
>>
>> On Thu, Jan 7, 2010 at 6:57 PM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> kill -9 is harsh, but, perfectly fine from Lucene's standpoint.
>>> Likewise if the OS or JVM crashes, power is suddenly lost, the index
>>> will just fallback to the last successful commit.  What will cause
>>> corruption is if you have bit errors happening somewhere in the
>>> machine... or if two writers are accidentally allowed to be open on
>>> one index... then you're in trouble.
>>>
>>> What IO system (filesystem & hardware) are you using on Linux?
>>> Boiling down to a smallish test case can help to isolate the
>>> problem...
>>>
>>> Mike
>>>
>>> On Thu, Jan 7, 2010 at 11:51 AM, Erick Erickson 
>>> wrote:
>>> > Can you show us the code where you commit?
>>> >
>>> > And how do you kill your process? Kill -9 is...er...harsh
>>> >
>>> > Yeah, I'm wondering whether the index file size *stays*
>>> > changed after you kill you process. If it keeps its
>>> > growing on every run (after you kill your process
>>> > multiple times), then I'd suspect that you aren't
>>> > adding documents like you think you are. Perhaps
>>> > different fields, different analyzers, etc.
>>> >
>>> > Luke should show you the largest document by ID,
>>> > as well as document counts. Comparing changes
>>> > in the document count and the max doc ID should
>>> > tell you something...
>>> >
>>> > Is it possible that you are updating existing docs
>>> > rather than adding new ones?
>>> >
>>> > Best
>>> > Erick
>>> >
>>> > On Thu, Jan 7, 2010 at 10:41 AM, Naama Kraus 
>>> wrote:
>>> >
>>> >> Thanks dor the input.
>>> >>
>>> >> 1. While the process is running, I do see the index files growing on
>>> disk
>>> >> and the time stamps changing. Should I see a change in size right after
>>> >> killing the process, is that what you mean ?
>>> >> 2. Yes, same directory is being used for indexing and search.
>>> >> 3. Didn't try Luke, good idea. Though I wonder, the same code runs well
>>> on
>>> >> Windows.
>>> >>
>>> >> Naama
>>> >>
>>> >> On Thu, Jan 7, 2010 at 3:37 PM, Erick Erickson <
>>> erickerick...@gmail.com
>>> >> >wrote:
>>> >>
>>> >> > Several questions:
>>> >> > 1> are the index files larger after you kill your process?
>>> >> >    Or have the timestamps changed?
>>> >> > 2> are you absolutely sure that your indexer, when you
>>> >> >     add documents, is pointing at the same directory your
>>> >> >     search is pointing to?
>>> >> > 3> Have you gotten a copy of Luke and examined your index
>>> >> >     to see if, perhaps, your documents aren't being added the
>>> >> >     way you think they are?
>>> >> >
>>> >> > Erick
>>> >> >
>>> >> > On Thu, Jan 7, 2010 at 7:13 AM, Naama Kraus 
>>> >> wrote:
>>> >> >
>>> >> > > Hi,
>>> >> > >
>>> >> > > I am using IndexWriter#commit() methods in my program to commit
>>> >> document
>>> >> > > additions to the index. I do that once in a while, after a bunch of
>>> >> > > documents were added. Since my indexing process is long, I want to
>>> make
>>> >> > > sure
>>> >> > > I don't loose too many additions in case of a crash.
>>> >> > > When running on Windows, things work as expected. But when running
>>> my
>>> >> > code
>>> >> > > on Linux, seems like commit() has no effect. If I kill my program
>>> and
>>> >> > then
>>> >> > > restart it, I don't see documents that I added and then committed
>>> (they
>>> >> > are
>>>

Re: Best Practice 3.0.0

2010-02-08 Thread Michael McCandless

Use IndexWriter.getReader to get a near real-time reader, after making
changes...

Mike

On Mon, Feb 8, 2010 at 3:45 AM, NanoE  wrote:
>
> Hello,
>
> I am writing small library search and want to know what are the best
> practice for lucene 3.0.0 for almost real time index update?
>
> Thanks Nano
> --
> View this message in context: 
> http://old.nabble.com/Best-Practice-3.0.0-tp27496796p27496796.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Scale Out

2010-02-08 Thread Ganesh

Our indexes is growing and the sorted cache is taking huge amount of RAM. We 
want to add multiple nodes, and scale out the search. 

Currently my applaication supports RMI interface and it return appliaction 
specific result set objects as hits. I could host multiple search instance in 
different nodes, then i may need to sort / combine the results. 

Any thoughts on scaling / clustering? Whether i need to use Hadoop / Carrot 
etc...

Regards
Ganesh


Send instant messages to your online friends http://in.messenger.yahoo.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Problems with IndexWriter#commit() on Linux

2010-02-08 Thread Naama Kraus

Here is what I get with mount -l
/dev/mapper/lvm--raid-lvm0 on /data3 type ext3 (rw) []

Is there anything else to get more details of the mount options ?

On Mon, Feb 8, 2010 at 10:57 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Thanks for sharing...
>
> Software RAID should be perfectly fine for Lucene, in general, unless
> the mount is configured to ignore fsync (I think the "data=writeback"
> mount option for ext3 does so on Linux).
>
> Can you check the mount options on your RAID filesystem?
>
> Mike
>
> On Mon, Feb 8, 2010 at 2:09 AM, Naama Kraus  wrote:
> > Hi All,
> >
> > I am back to this one after some while.
> > It appears the file system I was using resides on software RAID disks. I
> ran
> > the same code on the same Linux machine, but on another file system
> residing
> > on SCSI disks. I didn't observe the problem there.
> > Both file systems are ext3.
> > So I am guessing the problem relates to the RAID disks.
> >
> > I looked again at commit() API, and the following comment may be
> explaining:
> >
> > "Note that this operation calls Directory.sync on the index files. That
> call
> > should not return until the file contents & metadata are on stable
> storage.
> > For FSDirectory, this calls the OS's fsync. But, beware: some hardware
> > devices may in fact cache writes even during fsync, and return before the
> > bits are actually on stable storage, to give the appearance of faster
> > performance. If you have such a device, and it does not have a battery
> > backup (for example) then on power loss it may still lose data. Lucene
> > cannot guarantee consistency on such devices."
> >
> > Well, for me, running on the SCSI disks is just fine, I wanted to anyway
> > share my experience.
> >
> > Naama
> >
> > On Fri, Jan 8, 2010 at 12:09 AM, Naama Kraus 
> wrote:
> >
> >> Thanks all for the hints, I'll get back to my code and do some
> additional
> >> checks.
> >> Naama
> >>
> >>
> >> On Thu, Jan 7, 2010 at 6:57 PM, Michael McCandless <
> >> luc...@mikemccandless.com> wrote:
> >>
> >>> kill -9 is harsh, but, perfectly fine from Lucene's standpoint.
> >>> Likewise if the OS or JVM crashes, power is suddenly lost, the index
> >>> will just fallback to the last successful commit.  What will cause
> >>> corruption is if you have bit errors happening somewhere in the
> >>> machine... or if two writers are accidentally allowed to be open on
> >>> one index... then you're in trouble.
> >>>
> >>> What IO system (filesystem & hardware) are you using on Linux?
> >>> Boiling down to a smallish test case can help to isolate the
> >>> problem...
> >>>
> >>> Mike
> >>>
> >>> On Thu, Jan 7, 2010 at 11:51 AM, Erick Erickson <
> erickerick...@gmail.com>
> >>> wrote:
> >>> > Can you show us the code where you commit?
> >>> >
> >>> > And how do you kill your process? Kill -9 is...er...harsh
> >>> >
> >>> > Yeah, I'm wondering whether the index file size *stays*
> >>> > changed after you kill you process. If it keeps its
> >>> > growing on every run (after you kill your process
> >>> > multiple times), then I'd suspect that you aren't
> >>> > adding documents like you think you are. Perhaps
> >>> > different fields, different analyzers, etc.
> >>> >
> >>> > Luke should show you the largest document by ID,
> >>> > as well as document counts. Comparing changes
> >>> > in the document count and the max doc ID should
> >>> > tell you something...
> >>> >
> >>> > Is it possible that you are updating existing docs
> >>> > rather than adding new ones?
> >>> >
> >>> > Best
> >>> > Erick
> >>> >
> >>> > On Thu, Jan 7, 2010 at 10:41 AM, Naama Kraus 
> >>> wrote:
> >>> >
> >>> >> Thanks dor the input.
> >>> >>
> >>> >> 1. While the process is running, I do see the index files growing on
> >>> disk
> >>> >> and the time stamps changing. Should I see a change in size right
> after
> >>> >> killing the process, is that what you mean ?
> >>> >> 2. Yes, same directory is being used for indexing and search.
> >>> >> 3. Didn't try Luke, good idea. Though I wonder, the same code runs
> well
> >>> on
> >>> >> Windows.
> >>> >>
> >>> >> Naama
> >>> >>
> >>> >> On Thu, Jan 7, 2010 at 3:37 PM, Erick Erickson <
> >>> erickerick...@gmail.com
> >>> >> >wrote:
> >>> >>
> >>> >> > Several questions:
> >>> >> > 1> are the index files larger after you kill your process?
> >>> >> >Or have the timestamps changed?
> >>> >> > 2> are you absolutely sure that your indexer, when you
> >>> >> > add documents, is pointing at the same directory your
> >>> >> > search is pointing to?
> >>> >> > 3> Have you gotten a copy of Luke and examined your index
> >>> >> > to see if, perhaps, your documents aren't being added the
> >>> >> > way you think they are?
> >>> >> >
> >>> >> > Erick
> >>> >> >
> >>> >> > On Thu, Jan 7, 2010 at 7:13 AM, Naama Kraus  >
> >>> >> wrote:
> >>> >> >
> >>> >> > > Hi,
> >>> >> > >
> >>> >> > > I am using IndexWriter#commit() methods in my program to commit
> >>> >> document
> >>> >> > > ad

Re: Scale Out

2010-02-08 Thread Ian Lea

http://katta.sourceforge.net/ sounds well worth a look.


--
Ian.


On Mon, Feb 8, 2010 at 10:14 AM, Ganesh  wrote:
> Our indexes is growing and the sorted cache is taking huge amount of RAM. We 
> want to add multiple nodes, and scale out the search.
>
> Currently my applaication supports RMI interface and it return appliaction 
> specific result set objects as hits. I could host multiple search instance in 
> different nodes, then i may need to sort / combine the results.
>
> Any thoughts on scaling / clustering? Whether i need to use Hadoop / Carrot 
> etc...
>
> Regards
> Ganesh
>
>
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Scale Out

2010-02-08 Thread Jeff Zhang

Solr has more powerful scalability than lucene, maybe you can try that


On Mon, Feb 8, 2010 at 6:14 PM, Ganesh  wrote:

> Our indexes is growing and the sorted cache is taking huge amount of RAM.
> We want to add multiple nodes, and scale out the search.
>
> Currently my applaication supports RMI interface and it return appliaction
> specific result set objects as hits. I could host multiple search instance
> in different nodes, then i may need to sort / combine the results.
>
> Any thoughts on scaling / clustering? Whether i need to use Hadoop / Carrot
> etc...
>
> Regards
> Ganesh
>
>
> Send instant messages to your online friends http://in.messenger.yahoo.com
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
Best Regards

Jeff Zhang

Re: Scale Out

2010-02-08 Thread Stanislaw Osinski

> Any thoughts on scaling / clustering? Whether i need to use Hadoop / Carrot
> etc...
>

Carrot2 does search results clustering (by content), while what you probably
need is server/index clustering. See the other responses in this thread for
suggestions.

S.

Re: Problems with IndexWriter#commit() on Linux

2010-02-08 Thread Michael McCandless

Hmmm... I think that means you're using the default data mode
(ordered), which should properly preserve writes if the OS or machine
crashes.

And actually I was wrong before -- even if the mount had
data=writeback, since you are "only" kill -9ing the process (not
crashing the machine), the data mount option doesn't matter.  That
option only affects what happens on a crash...

Can you work up a small example showing the problem?  And if possible,
turn on IndexWriter's infoStream, capture the output as you index up
until the kill -9, and post that?

Mike

On Mon, Feb 8, 2010 at 3:57 AM, Michael McCandless
 wrote:
> Thanks for sharing...
>
> Software RAID should be perfectly fine for Lucene, in general, unless
> the mount is configured to ignore fsync (I think the "data=writeback"
> mount option for ext3 does so on Linux).
>
> Can you check the mount options on your RAID filesystem?
>
> Mike
>
> On Mon, Feb 8, 2010 at 2:09 AM, Naama Kraus  wrote:
>> Hi All,
>>
>> I am back to this one after some while.
>> It appears the file system I was using resides on software RAID disks. I ran
>> the same code on the same Linux machine, but on another file system residing
>> on SCSI disks. I didn't observe the problem there.
>> Both file systems are ext3.
>> So I am guessing the problem relates to the RAID disks.
>>
>> I looked again at commit() API, and the following comment may be explaining:
>>
>> "Note that this operation calls Directory.sync on the index files. That call
>> should not return until the file contents & metadata are on stable storage.
>> For FSDirectory, this calls the OS's fsync. But, beware: some hardware
>> devices may in fact cache writes even during fsync, and return before the
>> bits are actually on stable storage, to give the appearance of faster
>> performance. If you have such a device, and it does not have a battery
>> backup (for example) then on power loss it may still lose data. Lucene
>> cannot guarantee consistency on such devices."
>>
>> Well, for me, running on the SCSI disks is just fine, I wanted to anyway
>> share my experience.
>>
>> Naama
>>
>> On Fri, Jan 8, 2010 at 12:09 AM, Naama Kraus  wrote:
>>
>>> Thanks all for the hints, I'll get back to my code and do some additional
>>> checks.
>>> Naama
>>>
>>>
>>> On Thu, Jan 7, 2010 at 6:57 PM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
 kill -9 is harsh, but, perfectly fine from Lucene's standpoint.
 Likewise if the OS or JVM crashes, power is suddenly lost, the index
 will just fallback to the last successful commit.  What will cause
 corruption is if you have bit errors happening somewhere in the
 machine... or if two writers are accidentally allowed to be open on
 one index... then you're in trouble.

 What IO system (filesystem & hardware) are you using on Linux?
 Boiling down to a smallish test case can help to isolate the
 problem...

 Mike

 On Thu, Jan 7, 2010 at 11:51 AM, Erick Erickson 
 wrote:
 > Can you show us the code where you commit?
 >
 > And how do you kill your process? Kill -9 is...er...harsh
 >
 > Yeah, I'm wondering whether the index file size *stays*
 > changed after you kill you process. If it keeps its
 > growing on every run (after you kill your process
 > multiple times), then I'd suspect that you aren't
 > adding documents like you think you are. Perhaps
 > different fields, different analyzers, etc.
 >
 > Luke should show you the largest document by ID,
 > as well as document counts. Comparing changes
 > in the document count and the max doc ID should
 > tell you something...
 >
 > Is it possible that you are updating existing docs
 > rather than adding new ones?
 >
 > Best
 > Erick
 >
 > On Thu, Jan 7, 2010 at 10:41 AM, Naama Kraus 
 wrote:
 >
 >> Thanks dor the input.
 >>
 >> 1. While the process is running, I do see the index files growing on
 disk
 >> and the time stamps changing. Should I see a change in size right after
 >> killing the process, is that what you mean ?
 >> 2. Yes, same directory is being used for indexing and search.
 >> 3. Didn't try Luke, good idea. Though I wonder, the same code runs well
 on
 >> Windows.
 >>
 >> Naama
 >>
 >> On Thu, Jan 7, 2010 at 3:37 PM, Erick Erickson <
 erickerick...@gmail.com
 >> >wrote:
 >>
 >> > Several questions:
 >> > 1> are the index files larger after you kill your process?
 >> >    Or have the timestamps changed?
 >> > 2> are you absolutely sure that your indexer, when you
 >> >     add documents, is pointing at the same directory your
 >> >     search is pointing to?
 >> > 3> Have you gotten a copy of Luke and examined your index
 >> >     to see if, perhaps, your documents aren't being added the
 >> >     way you think they are?
 >> >
 >> > Erick
 >> >
>

Re: Scale Out

2010-02-08 Thread Chris Lu

Since you already have RMI interface, maybe you can parallel search on 
several nodes, collect the data, pick top ones, and send back results 
via RMI.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 
Million Euro funding!


Ganesh wrote:
Our indexes is growing and the sorted cache is taking huge amount of RAM. We want to add multiple nodes, and scale out the search. 

Currently my applaication supports RMI interface and it return appliaction specific result set objects as hits. I could host multiple search instance in different nodes, then i may need to sort / combine the results. 


Any thoughts on scaling / clustering? Whether i need to use Hadoop / Carrot 
etc...

Regards
Ganesh


Send instant messages to your online friends http://in.messenger.yahoo.com 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

  


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Scale Out

2010-02-08 Thread Jake Mannix

On Mon, Feb 8, 2010 at 9:33 AM, Chris Lu  wrote:

> Since you already have RMI interface, maybe you can parallel search on
> several nodes, collect the data, pick top ones, and send back results via
> RMI.
>

One thing to be careful about this, which you might already be aware of:
Query (and subclasses) implement Serializable, but doesn't declare a
serialversionUID, and so when you upgrade from lucene 2.4 to 2.9 or even 3.0
to 3.0.1, you can get serialization incompatibilities between your broker
and your leaf nodes if you pass serialized Query objects over RMI (and try
to do a rolling upgrade, one node at a time).  If you pass domain-specific
objects which you control, this doesn't happen, of course.

Not the end of the world, but good to keep in mind.

  -jake

ElasticSearch - An open source, distributed, search engine built on top of Lucene

2010-02-08 Thread Shay Banon

Hi,

   Just wanted to announce the release of a new open source project called
ElasticSearch (http://www.elasticsearch.com/). Its an open source (Apache
2), distributed, search engine built on top of Lucene. There are many
features for ElasticSearch, you can find them here:
http://www.elasticsearch.com/products/elasticsearch/.

   Thanks to all of keep investing in Lucene, and to all Lucene users. I
hope some will find it useful.

Cheers,
Shay

Lucene fields not analyzed

2010-02-08 Thread Rohit Banga

Hello

i have a field that stores names of people. i have used the NOT_ANALYZED
parameter to index the names.

this is what happens during indexing

doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
Field.Index.NOT_ANALYZED));



when i search it, i create a query parser using standardanalyzer and append
~0.5 to the search query.

the problem is that if the indexed name is "Mr. Kumar", my search does not
work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the space).

// searching code
File index_directory = new File(INDEX_DIR_PATH);
IndexReader reader =
IndexReader.open(FSDirectory.open(index_directory), true);
Searcher searcher = new IndexSearcher(reader);

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "name",
analyzer);

Query query;
query = parser.parse(text + "~0.5");

how to make it work?

Rohit Banga

RE: Lucene fields not analyzed

2010-02-08 Thread Uwe Schindler

QueryParser uses the given Analyzer when constructing they query, so it will 
never hit a NOT_ANALYZED term. In general, it is a bad idea to use QueryParser 
on fields that are not analyzed. There are two possibilities to solve the 
problem:

- Instantiate the query to match the not-analyzed (but indexed field) directly 
as a TermQuery.
- Use a PerFieldAnalyzerWrapper and choose a specific analyzer for this field 
that does not touch your names (e.g. KeywordAnalyzer). Use this wrapped 
analyzer for the both searching an indexing (and use Field.Index.ANALYZED!).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Rohit Banga [mailto:iamrohitba...@gmail.com]
> Sent: Tuesday, February 09, 2010 8:27 AM
> To: java-user@lucene.apache.org
> Subject: Lucene fields not analyzed
> 
> Hello
> 
> i have a field that stores names of people. i have used the
> NOT_ANALYZED
> parameter to index the names.
> 
> this is what happens during indexing
> 
> doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
> 
> 
> 
> when i search it, i create a query parser using standardanalyzer and
> append
> ~0.5 to the search query.
> 
> the problem is that if the indexed name is "Mr. Kumar", my search does
> not
> work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the
> space).
> 
> // searching code
> File index_directory = new File(INDEX_DIR_PATH);
> IndexReader reader =
> IndexReader.open(FSDirectory.open(index_directory), true);
> Searcher searcher = new IndexSearcher(reader);
> 
> Analyzer analyzer = new
> StandardAnalyzer(Version.LUCENE_CURRENT);
> 
> QueryParser parser = new QueryParser(Version.LUCENE_CURRENT,
> "name",
> analyzer);
> 
> Query query;
> query = parser.parse(text + "~0.5");
> 
> how to make it work?
> 
> Rohit Banga


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene fields not analyzed

2010-02-08 Thread Mark Harwood

I suspect it is because QueryParser uses space characters to separate different 
clauses in a query string while you want the space to represent some content in 
your "name" field. Try escaping the space character.

Cheers
Mark



On 9 Feb 2010, at 07:26, Rohit Banga wrote:

> Hello
> 
> i have a field that stores names of people. i have used the NOT_ANALYZED
> parameter to index the names.
> 
> this is what happens during indexing
> 
>doc.add(new Field("name", "\"" + name + "\"", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
> 
> 
> 
> when i search it, i create a query parser using standardanalyzer and append
> ~0.5 to the search query.
> 
> the problem is that if the indexed name is "Mr. Kumar", my search does not
> work for "Mr. Kumar" while it does work for "Mr.Kumar" (without the space).
> 
> // searching code
>File index_directory = new File(INDEX_DIR_PATH);
>IndexReader reader =
> IndexReader.open(FSDirectory.open(index_directory), true);
>Searcher searcher = new IndexSearcher(reader);
> 
>Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
> 
>QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "name",
> analyzer);
> 
>Query query;
>query = parser.parse(text + "~0.5");
> 
> how to make it work?
> 
> Rohit Banga


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Lucene document get values

Best Practice 3.0.0

Re: Problems with IndexWriter#commit() on Linux

Re: Best Practice 3.0.0

Scale Out

Re: Problems with IndexWriter#commit() on Linux

Re: Scale Out

Re: Scale Out

Re: Scale Out

Re: Problems with IndexWriter#commit() on Linux

Re: Scale Out

Re: Scale Out

ElasticSearch - An open source, distributed, search engine built on top of Lucene

Lucene fields not analyzed

RE: Lucene fields not analyzed

Re: Lucene fields not analyzed

16 matches

Site Navigation

Mail list logo

Footer information