Can't get a hit

2011-12-29 Thread Cheng
Hi,

I need to save a list of records into an index on hard drive. I keep a
writer and a reader open till the end of the operation.

My issue is that I need to compare each of the new records with each of the
records that have been saved into the index. There are plenty of duplicate
records in the original list.

To my surprise, I can't find a hit for a duplicate record on the fly
although I use the writer.commit() for every record that were being saved.

However, if I intentionally stopped the operations (some of the records
being saved), I re-ran the list of records and lots of hits occurs.

Please help!

Thanks!


How to save in-memory index into disk

2011-12-31 Thread Cheng
Hi,

I am creating a RAMDirectory based upon a folder on disk. After doing a lot
of adding, deleting, or updating, I want to flush the changes to the disk.
However, the flush() function is not available for 3.5. How can I save the
changes to disk?

Thanks!


How to use RAMDirectory more efficiently

2011-12-31 Thread Cheng
Hi,

Suppose that we have a huge amount of indices on hard drives but working in
RAMDirectory is a must, how can we decide which part of the indices to be
loaded into RAM, how to modify the indices, and when and how to synchronize
the indices with those on hard drives?

Any thoughts?

Thanks!


Re: How to use RAMDirectory more efficiently

2012-01-01 Thread Cheng
what about my code as follow:

FSDirectory indexDir = new NIOFSDirectory(new File("c:/index_folder"));
Directory ramDir = new RAMDirectory(indexDir);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_35,
new StandardAnalyzer(Version.LUCENE_35));
IndexWriter iw = new IndexWriter(ramDir, iwc);

I associate the FSDirectory and RAMDirectory as the very beginning. Will
the two be synchronized when the writer is committed or close?

Thanks



On Sun, Jan 1, 2012 at 2:56 PM, Charlie Hubbard
wrote:

> You can always index into RAMDirectory for speed then synchronize those
> changes to the disk by adding the RAMDirectory to a FSDirectory at some
> point.  Here is a simple example of how to do that:
>
>public void save( RAMDirectory ram, File dir ) {
>   FSDirectory fs = FSDirectory.open( dir );
>   IndexWriter writer = new IndexWriter( fs, ... );
>   try {
>writer.addIndexes( ram );
>   } finally {
> writer.close();
>   }
>   }
>
>
> http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/index/IndexWriter.html#addIndexes(org.apache.lucene.store.Directory
> ..
> .)
>
> On Sat, Dec 31, 2011 at 3:37 PM, Cheng  wrote:
>
> > Hi,
> >
> > Suppose that we have a huge amount of indices on hard drives but working
> in
> > RAMDirectory is a must, how can we decide which part of the indices to be
> > loaded into RAM, how to modify the indices, and when and how to
> synchronize
> > the indices with those on hard drives?
> >
> > Any thoughts?
> >
> > Thanks!
> >
>


queryParser.ParseException & Encountered ""

2012-01-01 Thread Cheng
Hi,

I was trying to use QueryParser for some chinese, but encountered the
following issues:

(1) org.apache.lucene.queryParser.ParseException: Cannot parse '大众UP!':
Encountered "" at line 1, column 5.

the error seems to be the Chinese exclamation mark.

(2) org.apache.lucene.queryParser.ParseException: Cannot parse '真功夫(小榄店)':
Encountered "" at line 1, column 8.

the error seems to be the Chinese right bracket (the left bracket is in
English).

I have searched this  error, and the solution seems to be rewrite a
Escape() function, which is beyond my capability.

How can I solve these issues in an easier way?

Thanks


Shared IndexWriter does not increase speed

2012-01-06 Thread Cheng
Hi,

I am trying to use a shared IndexWriter instance for a multi-thread
application. Surprisingly, this under performs by creating a writer
instance within a thread.

My code is as follow. Can someone help explain why? Thanks.


Scenario 1: shared IndexWriter instance

RAMDirectory ramDir = new RAMDirectory();
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_35,  new
IKAnalyzer() );
IndexWriter iw = new IndexWriter(ramDir, iwc);

ExecutorService executor = Executors
.newFixedThreadPool(20);

for(int i=0; i<1; i++){
   Runnable runable = new MyRunnable(iw, ramDir);
}

Scenario 2: create IndexWriter instance with MyRunnable()

public MyRunnable implements Runnable{

..

public void run(){
RAMDirectory ramDir = new RAMDirectory();
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_35,  new
IKAnalyzer() );
IndexWriter iw = new IndexWriter(ramDir, iwc);
}


}


Strategy for large index files

2012-01-07 Thread Cheng
Hi, my servlet application is running a large index of 20G. I don't think
it can be loaded to RAM at one time.

What are the general strategies to improve the search and write performance?

Thanks


Build RAMDirectory on FSDirectory, and then synchronzing the two

2012-01-08 Thread Cheng
Hi,

I new a RAMDirectory based upon a FSDirectory. After a few modifications, I
would like to synchronize the two.

Some on the mailing list provided a solution that uses addIndex() function.

However, the FSDirectory simply combines with the RAMDirectory, and the
size doubled.

How can I do a real synchronization?

Thanks


Re: Build RAMDirectory on FSDirectory, and then synchronzing the two

2012-01-10 Thread Cheng
I tried  IndexWriterConfig.OpenMode CREATE, and the size is doubled.

The only way that is effective is the writer's deleteAll() methods.

On Mon, Jan 9, 2012 at 5:23 AM, Ian Lea  wrote:

> If you load an existing disk index into a RAMDirectory, make some
> changes in RAM and call addIndexes to add the contents of the
> RAMDirectory to the original disk index, you are likely to end up with
> duplicate data on disk.  Depending of course on what you've done to
> the RAM index.
>
> Sounds you want to call addIndexes using a writer on a new, empty,
> index or overwrite the original. IndexWriterConfig.OpenMode CREATE.
>
>
> --
> Ian.
>
>
> On Mon, Jan 9, 2012 at 4:29 AM, dyzc <1393975...@qq.com> wrote:
> > I'd better provide a snapshot of my code for people to understand my
> issues:
> >
> >
> > File file=new File("c:/index_files");
> > FSDirectory fsDir=new FSDirectory(file);
> > RAMDirectory ramDir=new RAMDirectory(fsDir, new
> IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer());
> >
> >
> > IndexWriter iw = new IndexWriter(ramDir, iwc);
> >
> >
> > ..DO something here with iw (associated with ramDir).
> >
> >
> > Now I am trying to synchronize ramDir with fsDir:
> >
> >
> > //close iw prior to synchronization
> > iw.close();
> >
> >
> > // synchronize RAM with FS
> > IndexWriter writer = new IndexWriter(fsDir, new
> IndexWriterConfig(Version.LUCENE_35, ik));
> > writer.addIndexes(ramDir);
> > writer.close();
> > ramDir.close();
> >
> >
> >
> > Now I end up with duplicate copies of index files in c:/index_files
> >
> >
> > Is there something that I miss here?
> >
> >
> > -- Original --
> > From:  "zhoucheng2008";
> > Date:  Mon, Jan 9, 2012 12:04 PM
> > To:  "java-user";
> >
> > Subject:  Build RAMDirectory on FSDirectory, and then synchronzing the
> two
> >
> >
> > Hi,
> >
> > I new a RAMDirectory based upon a FSDirectory. After a few
> modifications, I would like to synchronize the two.
> >
> >
> > Some on the mailing list provided a solution that uses addIndex()
> function.
> >
> >
> > However, the FSDirectory simply combines with the RAMDirectory, and the
> size doubled.
> >
> >
> > How can I do a real synchronization?
> >
> >
> > Thanks
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


shared instance of IndexWriter doesn't improve proformance

2012-01-10 Thread Cheng
Hi,

I use a same instance of writer for multiple threads. It turns out that the
time to finish jobs is more than to create a new writer instance in each
thread. What would be the possible reasons?

Thanks


Seem contradictive -- indexwriter in handling multiple threads

2012-01-11 Thread Cheng
I have read a lot about IndexWriter and multi-threading over the Internet.
It seems to me that the normal practice is:

1) use a same indexwriter instance for multiple threads;
2) create an individual RAMDirectory per threads;
3) use addIndexes(Directory[]) methods to add to a local drive folder all
the indexes stored in the ram directories of the threads.

My question are:

a) For 1), the IndexWriter instance must be associated with a lucene
Directory, right? Let us assume it is built on a FSDirectory, then what is
the purpose to use this FSDirectory-built indexwriter in 2), where the
documents are added into a RAMDirectory? That being said, there must be a
new index writer created upon a RAMDirectory in each thread. If it is the
case, we should create a RAMDirectory-built writer in each thread, rather
than use a FSDirectory-built writer cross multiple threads.

b) For 3), the writer presumed to perform the addIndexes(Directory[])
function seems to be the same one created in 1). Within each thread, the
index writer needs to close to make new documents visible to the searcher
of that thread. So, since the writer has been closed in each or more of the
threads, how can this writer be used outside the thread run functions to
add all the RAMDirectories?

Am I missing something?

Thanks


Re: Seem contradictive -- indexwriter in handling multiple threads

2012-01-11 Thread Cheng
Can I create a RAMDirectory based writer and have it work cross all
threads? In the sense, I would like to use RAMDirectory every where and
have the RAMDirectory written to FSDirectory in the end.

I suppose that should work, right?


On Wed, Jan 11, 2012 at 2:31 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Wed, Jan 11, 2012 at 1:32 PM, dyzc2010  wrote:
>
> > Mike, do you mean if I create a FSDirectory based writer in first place,
> then the writer should be used in every thread rather than create a new
> RAMDirectory based writer in that thread?
>
> Right.
>
> > What about I do want to use RAMDirectory to speed up the index and
> search processes?
>
> IndexWriter is very efficient in using RAM across multiple threads
> now... so this isn't worth it at indexing time.
>
> At search time... MMapDirectory is a good way to let the OS use
> currently free RAM for caching.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Seem contradictive -- indexwriter in handling multiple threads

2012-01-11 Thread Cheng
Will do if I see a perf gain.

The other issue is that in each thread my apps will not only do indexing
but searching. That means I will have to pass through the ram directory
instance, along with the writer instance, to every thread so that the
searcher can be built on.

Should I create a same reader and a same searcher and pass them through
every thread too?



On Wed, Jan 11, 2012 at 3:21 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Yes that would work fine but you should see a net perf loss by
> doing so (once you include time to flush/sync the RAMDir to an FSDir).
>
> If you see a perf gain then please report back!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jan 11, 2012 at 3:09 PM, Cheng  wrote:
> > Can I create a RAMDirectory based writer and have it work cross all
> > threads? In the sense, I would like to use RAMDirectory every where and
> > have the RAMDirectory written to FSDirectory in the end.
> >
> > I suppose that should work, right?
> >
> >
> > On Wed, Jan 11, 2012 at 2:31 PM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> On Wed, Jan 11, 2012 at 1:32 PM, dyzc2010 
> wrote:
> >>
> >> > Mike, do you mean if I create a FSDirectory based writer in first
> place,
> >> then the writer should be used in every thread rather than create a new
> >> RAMDirectory based writer in that thread?
> >>
> >> Right.
> >>
> >> > What about I do want to use RAMDirectory to speed up the index and
> >> search processes?
> >>
> >> IndexWriter is very efficient in using RAM across multiple threads
> >> now... so this isn't worth it at indexing time.
> >>
> >> At search time... MMapDirectory is a good way to let the OS use
> >> currently free RAM for caching.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Seem contradictive -- indexwriter in handling multiple threads

2012-01-11 Thread Cheng
Will do thanks

On Wed, Jan 11, 2012 at 3:37 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Yes, it's best to share one IndexSearcher/IndexReader across all
> threads... and if you ever find evidence this hurts concurrency then
> please post back :)
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jan 11, 2012 at 3:29 PM, Cheng  wrote:
> > Will do if I see a perf gain.
> >
> > The other issue is that in each thread my apps will not only do indexing
> > but searching. That means I will have to pass through the ram directory
> > instance, along with the writer instance, to every thread so that the
> > searcher can be built on.
> >
> > Should I create a same reader and a same searcher and pass them through
> > every thread too?
> >
> >
> >
> > On Wed, Jan 11, 2012 at 3:21 PM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Yes that would work fine but you should see a net perf loss by
> >> doing so (once you include time to flush/sync the RAMDir to an FSDir).
> >>
> >> If you see a perf gain then please report back!
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Wed, Jan 11, 2012 at 3:09 PM, Cheng  wrote:
> >> > Can I create a RAMDirectory based writer and have it work cross all
> >> > threads? In the sense, I would like to use RAMDirectory every where
> and
> >> > have the RAMDirectory written to FSDirectory in the end.
> >> >
> >> > I suppose that should work, right?
> >> >
> >> >
> >> > On Wed, Jan 11, 2012 at 2:31 PM, Michael McCandless <
> >> > luc...@mikemccandless.com> wrote:
> >> >
> >> >> On Wed, Jan 11, 2012 at 1:32 PM, dyzc2010 
> >> wrote:
> >> >>
> >> >> > Mike, do you mean if I create a FSDirectory based writer in first
> >> place,
> >> >> then the writer should be used in every thread rather than create a
> new
> >> >> RAMDirectory based writer in that thread?
> >> >>
> >> >> Right.
> >> >>
> >> >> > What about I do want to use RAMDirectory to speed up the index and
> >> >> search processes?
> >> >>
> >> >> IndexWriter is very efficient in using RAM across multiple threads
> >> >> now... so this isn't worth it at indexing time.
> >> >>
> >> >> At search time... MMapDirectory is a good way to let the OS use
> >> >> currently free RAM for caching.
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Is it necessary to create a new searcher?

2012-01-11 Thread Cheng
I am currently using the following statement at the end of each index
writing, although I don't know if the writing modifies the indexes or not:

is = new IndexSearcher(IndexReader.openIfChanged(ir));

# is -> IndexSearcher, ir-> IndexReader


My question is how expensive to create a searcher instance (I have
potentially hundreds of thousands of docs added or modified so the cost to
create a new searcher instance may be unbearable.)

Should I use the IndexReader.isCurrent() instead to check if is current?

Thanks


Re: Build RAMDirectory on FSDirectory, and then synchronzing the two

2012-01-12 Thread Cheng
The reason is I have indexes on hard drive but want to load them into ram
for faster searching, adding, deleting, etc.

Using RAMDirectory can help achieve this goal.

On Thu, Jan 12, 2012 at 6:36 PM, Sanne Grinovero
wrote:

> Maybe you could explain why you are doing this? Someone could suggest
> alternative approaches.
>
> Regards,
> Sanne
> On Jan 12, 2012 4:02 AM, "dyzc" <1393975...@qq.com> wrote:
>
> > That lies in that my apps add indexes to those in RAM rather than update
> > them. So the size doubled. Seem not related to the OpenMode.CREATE
> option.
> >
> >
> > -- Original --
> > From:  "Ian Lea";
> > Date:  Wed, Jan 11, 2012 05:20 PM
> > To:  "java-user";
> >
> > Subject:  Re: Build RAMDirectory on FSDirectory, and then synchronzing
> the
> > two
> >
> >
> > > I tried  IndexWriterConfig.OpenMode CREATE, and the size is doubled.
> >
> > Prove it.
> >
> >
> > --
> > Ian.
> >
> > > The only way that is effective is the writer's deleteAll() methods.
> > >
> > > On Mon, Jan 9, 2012 at 5:23 AM, Ian Lea  wrote:
> > >
> > >> If you load an existing disk index into a RAMDirectory, make some
> > >> changes in RAM and call addIndexes to add the contents of the
> > >> RAMDirectory to the original disk index, you are likely to end up with
> > >> duplicate data on disk.  Depending of course on what you've done to
> > >> the RAM index.
> > >>
> > >> Sounds you want to call addIndexes using a writer on a new, empty,
> > >> index or overwrite the original. IndexWriterConfig.OpenMode CREATE.
> > >>
> > >>
> > >> --
> > >> Ian.
> > >>
> > >>
> > >> On Mon, Jan 9, 2012 at 4:29 AM, dyzc <1393975...@qq.com> wrote:
> > >> > I'd better provide a snapshot of my code for people to understand my
> > >> issues:
> > >> >
> > >> >
> > >> > File file=new File("c:/index_files");
> > >> > FSDirectory fsDir=new FSDirectory(file);
> > >> > RAMDirectory ramDir=new RAMDirectory(fsDir, new
> > >> IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer());
> > >> >
> > >> >
> > >> > IndexWriter iw = new IndexWriter(ramDir, iwc);
> > >> >
> > >> >
> > >> > ..DO something here with iw (associated with ramDir).
> > >> >
> > >> >
> > >> > Now I am trying to synchronize ramDir with fsDir:
> > >> >
> > >> >
> > >> > //close iw prior to synchronization
> > >> > iw.close();
> > >> >
> > >> >
> > >> > // synchronize RAM with FS
> > >> > IndexWriter writer = new IndexWriter(fsDir, new
> > >> IndexWriterConfig(Version.LUCENE_35, ik));
> > >> > writer.addIndexes(ramDir);
> > >> > writer.close();
> > >> > ramDir.close();
> > >> >
> > >> >
> > >> >
> > >> > Now I end up with duplicate copies of index files in c:/index_files
> > >> >
> > >> >
> > >> > Is there something that I miss here?
> > >> >
> > >> >
> > >> > -- Original --
> > >> > From:  "zhoucheng2008";
> > >> > Date:  Mon, Jan 9, 2012 12:04 PM
> > >> > To:  "java-user";
> > >> >
> > >> > Subject:  Build RAMDirectory on FSDirectory, and then synchronzing
> the
> > >> two
> > >> >
> > >> >
> > >> > Hi,
> > >> >
> > >> > I new a RAMDirectory based upon a FSDirectory. After a few
> > >> modifications, I would like to synchronize the two.
> > >> >
> > >> >
> > >> > Some on the mailing list provided a solution that uses addIndex()
> > >> function.
> > >> >
> > >> >
> > >> > However, the FSDirectory simply combines with the RAMDirectory, and
> > the
> > >> size doubled.
> > >> >
> > >> >
> > >> > How can I do a real synchronization?
> > >> >
> > >> >
> > >> > Thanks
> > >>
> > >> -
> > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>
> > >>
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>


10 million entities and 100 million related information

2012-01-12 Thread Cheng
I have 10MM entities, for each of which I will index 10-20 fields. Also, I
will have to index 100MM related information of the entities, and each
piece of the information will have to go through some Analyzer.

I have a few questions:

1) Can I use just one index folder for all the data?

2) If I have to segment the data, what is the size of each segment such
that a real-time search is still achievable?

Thanks


Re: Is it necessary to create a new searcher?

2012-01-14 Thread Cheng
That sounds like what I am looking for. But do you have some code example
about how to use this NRTManager?


On Fri, Jan 13, 2012 at 12:05 PM, Ian Lea  wrote:

> The javadocs for oal.search.SearcherManager start "Utility class to
> safely share IndexSearcher instances across multiple threads, while
> periodically reopening."  The maybeReopen() method does what you would
> expect and can be called from multiple threads.
>
> Isn't that exactly what you need?  See also oal.search.NRTManager.
>
>
> --
> Ian.
>
>
> On Fri, Jan 13, 2012 at 4:06 PM, dyzc2010  wrote:
> > Thanks for pointing that out for me. I will change the code.
> >
> >
> > My challenge is that I use a same reader for multiple threads. So if I
> have to close a reader within a thread, the others may be affected. With
> that, what can I do within a thread to reopen a new reader?
> >
> >
> > -- Original --
> > From:  "Ian Lea";
> > Date:  Fri, Jan 13, 2012 05:47 PM
> > To:  "java-user";
> >
> > Subject:  Re: Is it necessary to create a new searcher?
> >
> >
> > The javadocs for openIfChanged say that it returns null if not
> > changed, so I don't think your code will work as is.  You need to
> > check the return value and you'll need to close the old reader if you
> > have been given a new one.
> >
> > If you are going to be reopening if changed, there seems little point
> > in calling isCurrent() rather than openIfChanged().  Searchers are
> > based on readers and readers are tied to a segment and if only one or
> > two segments have changed, only those readers will be reopened.
> > So in general, a reopen after a small number of updates may well be
> > quicker than a reopen after a large number of updates.  How important
> > is it that your searches get up to date data? If vital, you'll have to
> > reopen.  If not so vital you could instead reopen every now and again.
> >
> > You should take a look at NRTManager and NRTManagerReopenThread.
> > There's good info in the javadocs.
> >
> >
> > --
> > Ian.
> >
> >
> > On Wed, Jan 11, 2012 at 10:51 PM, Cheng  wrote:
> >> I am currently using the following statement at the end of each index
> >> writing, although I don't know if the writing modifies the indexes or
> not:
> >>
> >> is = new IndexSearcher(IndexReader.openIfChanged(ir));
> >>
> >> # is -> IndexSearcher, ir-> IndexReader
> >>
> >>
> >> My question is how expensive to create a searcher instance (I have
> >> potentially hundreds of thousands of docs added or modified so the cost
> to
> >> create a new searcher instance may be unbearable.)
> >>
> >> Should I use the IndexReader.isCurrent() instead to check if is current?
> >>
> >> Thanks
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Is it necessary to create a new searcher?

2012-01-14 Thread Cheng
I just found some interesting stuff here:

https://builds.apache.org/job/Lucene-3.x/javadoc/contrib-misc/org/apache/lucene/index/NRTManagerReopenThread.html



How the NRTManager is plugged into my executeservice framework?

On Sun, Jan 15, 2012 at 1:04 AM, Cheng  wrote:

> That sounds like what I am looking for. But do you have some code example
> about how to use this NRTManager?
>
>
> On Fri, Jan 13, 2012 at 12:05 PM, Ian Lea  wrote:
>
>> The javadocs for oal.search.SearcherManager start "Utility class to
>> safely share IndexSearcher instances across multiple threads, while
>> periodically reopening."  The maybeReopen() method does what you would
>> expect and can be called from multiple threads.
>>
>> Isn't that exactly what you need?  See also oal.search.NRTManager.
>>
>>
>> --
>> Ian.
>>
>>
>> On Fri, Jan 13, 2012 at 4:06 PM, dyzc2010 
>> wrote:
>> > Thanks for pointing that out for me. I will change the code.
>> >
>> >
>> > My challenge is that I use a same reader for multiple threads. So if I
>> have to close a reader within a thread, the others may be affected. With
>> that, what can I do within a thread to reopen a new reader?
>> >
>> >
>> > -- Original --
>> > From:  "Ian Lea";
>> > Date:  Fri, Jan 13, 2012 05:47 PM
>> > To:  "java-user";
>> >
>> > Subject:  Re: Is it necessary to create a new searcher?
>> >
>> >
>> > The javadocs for openIfChanged say that it returns null if not
>> > changed, so I don't think your code will work as is.  You need to
>> > check the return value and you'll need to close the old reader if you
>> > have been given a new one.
>> >
>> > If you are going to be reopening if changed, there seems little point
>> > in calling isCurrent() rather than openIfChanged().  Searchers are
>> > based on readers and readers are tied to a segment and if only one or
>> > two segments have changed, only those readers will be reopened.
>> > So in general, a reopen after a small number of updates may well be
>> > quicker than a reopen after a large number of updates.  How important
>> > is it that your searches get up to date data? If vital, you'll have to
>> > reopen.  If not so vital you could instead reopen every now and again.
>> >
>> > You should take a look at NRTManager and NRTManagerReopenThread.
>> > There's good info in the javadocs.
>> >
>> >
>> > --
>> > Ian.
>> >
>> >
>> > On Wed, Jan 11, 2012 at 10:51 PM, Cheng 
>> wrote:
>> >> I am currently using the following statement at the end of each index
>> >> writing, although I don't know if the writing modifies the indexes or
>> not:
>> >>
>> >> is = new IndexSearcher(IndexReader.openIfChanged(ir));
>> >>
>> >> # is -> IndexSearcher, ir-> IndexReader
>> >>
>> >>
>> >> My question is how expensive to create a searcher instance (I have
>> >> potentially hundreds of thousands of docs added or modified so the
>> cost to
>> >> create a new searcher instance may be unbearable.)
>> >>
>> >> Should I use the IndexReader.isCurrent() instead to check if is
>> current?
>> >>
>> >> Thanks
>> >
>> > -
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>


How NRTManagerReopenThread works with Java Executor framework?

2012-01-15 Thread Cheng
I saw the link,
https://builds.apache.org/job/Lucene-3.x/javadoc/contrib-misc/org/apache/lucene/index/NRTManagerReopenThread.html,
which talks about how to use the NRTManagerReopenThread.

I am currently using the Java ExecutorService framework to utilize a
multiple threading scenario. Pls see below.

ExecutorService executor = Executors
.newFixedThreadPool(ERConstants.maxThreads);

for(;;){
Runnable worker = new MyRunner(writer);
executor.execute(worker);
}

executor.shutdown();


My question is how to combine the ExecutorService framework with the
NRTManager class.

Thanks


Re: Is Lucene a good candidate for a Google-like search engine?

2012-01-16 Thread Cheng
greate thanks

On Mon, Jan 16, 2012 at 5:56 AM, findbestopensource <
findbestopensou...@gmail.com> wrote:

> Check out the presentation.
> http://java.dzone.com/videos/archive-it-scaling-beyond
>
> Web archive uses Lucene to index billions of pages.
>
> Regards
> Aditya
> www.findbestopensource.com
>
> On Fri, Jan 13, 2012 at 4:31 PM, Peter K  wrote:
>
> > yes and no!
> > google is not only the search engine ...
> >
> > > Just curious about that. Any thoughts?
> > >
> > > Thanks
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


NRTManager, NRTManagerReopenThread and ExecutorServices example

2012-01-18 Thread Cheng
Hi, can any of you provide a working code example that utilizes the
NRTManager, NRTManagerReopenThread and ExecutorServices instances?

The limited availability of information regarding these classes really
drives me nut.

Thanks


Cleaning up writer after certain idle time?

2012-01-25 Thread Cheng
Hi,

I am using multiple writer instances in a web service. Some instances are
busy all the time, while some aren't. I wonder how to configure the writer
to dissolve itself after a certain time of idling, say 30 seconds.

If the answer is yes, can I do more in the dissolving, such as writing the
changes to fs directory?

Thanks


Re: Cleaning up writer after certain idle time?

2012-01-25 Thread Cheng
Yes, I do have one indexwriter instance for each category.

By referring to "dissolving", I mean two things: first, to write all the
changes to a local directory for the writer; second, to disable the writer
instance.

On Wed, Jan 25, 2012 at 5:09 PM, Simon Willnauer <
simon.willna...@googlemail.com> wrote:

> Hey,
>
>
> On Wed, Jan 25, 2012 at 11:01 PM, Cheng  wrote:
> > Hi,
> >
> > I am using multiple writer instances in a web service. Some instances are
> > busy all the time, while some aren't. I wonder how to configure the
> writer
> > to dissolve itself after a certain time of idling, say 30 seconds.
> what do you mean by multiple writers, more than one IndexWriter? I
> don't understand what "dissolve" means in this context?
> maybe you can elaborate more on your architecture please?
>
> simon
> >
> > If the answer is yes, can I do more in the dissolving, such as writing
> the
> > changes to fs directory?
> >
> > Thanks
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


How to avoid filtering stop words like "IS" in StandardAnalyzer

2012-01-27 Thread Cheng
Hi,

I don't want to filter certain stop words within the StandardAnalyzer? Can
I do so?

Ideally, I would like to have a customized StandardAnalyzer.

Thanks.


Re: How to avoid filtering stop words like "IS" in StandardAnalyzer

2012-01-28 Thread Cheng
Pedro's suggestion seems to work fine. Not sure where I should use
CharArraySet.EMPTY_SET.

On Sat, Jan 28, 2012 at 6:56 AM, Uwe Schindler  wrote:

> Or even better: CharArraySet.EMPTY_SET - sorry for noise.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Uwe Schindler [mailto:u...@thetaphi.de]
> > Sent: Saturday, January 28, 2012 12:52 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: How to avoid filtering stop words like "IS" in
> StandardAnalyzer
> >
> > Right, but Collections.emptySet() should be used :-)
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: Pedro Lacerda [mailto:pslace...@gmail.com]
> > > Sent: Saturday, January 28, 2012 12:49 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: How to avoid filtering stop words like "IS" in
> > > StandardAnalyzer
> > >
> > > Hi Cheng,
> > >
> > > You can provide your own set of stop words as the second argument of
> > > StandardAnalyzer constructor.
> > >
> > > new StandardAnalyzer(version, new HashSet());
> > >
> > >
> > >
> > > Pedro Lacerda
> > >
> > >
> > >
> > > 2012/1/28 Cheng 
> > >
> > > > Hi,
> > > >
> > > > I don't want to filter certain stop words within the
> StandardAnalyzer?
> > > > Can I do so?
> > > >
> > > > Ideally, I would like to have a customized StandardAnalyzer.
> > > >
> > > > Thanks.
> > > >
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Configure writer to write to FSDirectory?

2012-02-05 Thread Cheng
Hi Uwe,

My challenge is that I need to update/modify the indexes frequently while
providing the search capability. I was trying to use FSDirectory, but found
out that the reading and writing from/to FSDirectory is unbearably slow. So
I now am trying the RAMDirectory, which is fast.

I don't know of  MMapDirectory, and wonder if it is as fast as RAMDirectory.


On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler  wrote:

> Hi Cheng,
>
> It seems that you use a RAMDirectory for *caching*, otherwise it makes no
> sense to write changes back. In recent Lucene versions, this is not a good
> idea, especially for large indexes (RAMDirectory eats your heap space,
> allocates millions of small byte[] arrays,...). If you need something like
> a
> caching Directory and you are working on a 64bit platform, you can use
> MMapDirectory (where the operating system kernel manages the read/write
> between disk an memory). MMapDirectory is returned by default for
> FSDirectory.open() on most 64 bit platforms. The good thing: the "caching"
> space is outside your JVM heap, so does not slowdown the garbage collector.
> So be sure to *not* allocate too much heap space (-Xmx) to your search app,
> only the minimum needed to execute it and leave the rest of your RAM
> available for the OS kernel to manage FS cache.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Sunday, February 05, 2012 7:56 AM
> > To: java-user@lucene.apache.org
> > Subject: Configure writer to write to FSDirectory?
> >
> > Hi,
> >
> > I build an RAMDirectory on a FSDirectory, and would like the writer
> associated
> > with the RAMDirectory to periodically write to hard drive.
> >
> > Is this achievable?
> >
> > Thanks.
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Configure writer to write to FSDirectory?

2012-02-05 Thread Cheng
I was trying to, but don't know how to even I read some of your blogs.

On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Are you using near-real-time readers?
>
> (IndexReader.open(IndexWriter))
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sun, Feb 5, 2012 at 9:03 AM, Cheng  wrote:
> > Hi Uwe,
> >
> > My challenge is that I need to update/modify the indexes frequently while
> > providing the search capability. I was trying to use FSDirectory, but
> found
> > out that the reading and writing from/to FSDirectory is unbearably slow.
> So
> > I now am trying the RAMDirectory, which is fast.
> >
> > I don't know of  MMapDirectory, and wonder if it is as fast as
> RAMDirectory.
> >
> >
> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler  wrote:
> >
> >> Hi Cheng,
> >>
> >> It seems that you use a RAMDirectory for *caching*, otherwise it makes
> no
> >> sense to write changes back. In recent Lucene versions, this is not a
> good
> >> idea, especially for large indexes (RAMDirectory eats your heap space,
> >> allocates millions of small byte[] arrays,...). If you need something
> like
> >> a
> >> caching Directory and you are working on a 64bit platform, you can use
> >> MMapDirectory (where the operating system kernel manages the read/write
> >> between disk an memory). MMapDirectory is returned by default for
> >> FSDirectory.open() on most 64 bit platforms. The good thing: the
> "caching"
> >> space is outside your JVM heap, so does not slowdown the garbage
> collector.
> >> So be sure to *not* allocate too much heap space (-Xmx) to your search
> app,
> >> only the minimum needed to execute it and leave the rest of your RAM
> >> available for the OS kernel to manage FS cache.
> >>
> >> Uwe
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >>
> >> > -Original Message-
> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> > Sent: Sunday, February 05, 2012 7:56 AM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Configure writer to write to FSDirectory?
> >> >
> >> > Hi,
> >> >
> >> > I build an RAMDirectory on a FSDirectory, and would like the writer
> >> associated
> >> > with the RAMDirectory to periodically write to hard drive.
> >> >
> >> > Is this achievable?
> >> >
> >> > Thanks.
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
I don't understand this following portion:

IndexWriter iw = new IndexWriter(whatever - some standard disk index);
NRTManager nrtm = new NRTManager(iw, null);
NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
ropt.setXxx(...);

ropt.start();

I have a java ExecutorServices instance running which take care of my own
applications. I don't know how this NRTManagerReopenThread works with my
own ExecutorService instance.

Can both work together? How can the NRTManagerReopenThread instance ropt be
plugged into my own multithreading framework?

On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:

> If you can use NRTManager and SearcherManager things should be easy
> and blazingly fast rather than unbearably slow.  The latter phrase is
> not one often associated with lucene.
>
> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
> NRTManager nrtm = new NRTManager(iw, null);
> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> ropt.setXxx(...);
> ...
> ropt.start();
>
> SearcherManager srchm = nrtm.getSearcherManager(b);
>
> Then add docs to your index via nrtm.addDocument(d), update with
> nrtm.updateDocument(...), and to search use
>
> IndexSearcher searcher = srchm.acquire();
> try {
>  search ...
> } finally {
>  srchm.release(searcher);
> }
>
> All thread safe so you don't have to worry about any complications
> there.  And I bet it'll be blindingly fast.
>
> Don't forget to close() things down at the end.
>
>
> --
> Ian.
>
>
>
> On Mon, Feb 6, 2012 at 12:15 AM, Cheng  wrote:
> > I was trying to, but don't know how to even I read some of your blogs.
> >
> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Are you using near-real-time readers?
> >>
> >> (IndexReader.open(IndexWriter))
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng  wrote:
> >> > Hi Uwe,
> >> >
> >> > My challenge is that I need to update/modify the indexes frequently
> while
> >> > providing the search capability. I was trying to use FSDirectory, but
> >> found
> >> > out that the reading and writing from/to FSDirectory is unbearably
> slow.
> >> So
> >> > I now am trying the RAMDirectory, which is fast.
> >> >
> >> > I don't know of  MMapDirectory, and wonder if it is as fast as
> >> RAMDirectory.
> >> >
> >> >
> >> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler 
> wrote:
> >> >
> >> >> Hi Cheng,
> >> >>
> >> >> It seems that you use a RAMDirectory for *caching*, otherwise it
> makes
> >> no
> >> >> sense to write changes back. In recent Lucene versions, this is not a
> >> good
> >> >> idea, especially for large indexes (RAMDirectory eats your heap
> space,
> >> >> allocates millions of small byte[] arrays,...). If you need something
> >> like
> >> >> a
> >> >> caching Directory and you are working on a 64bit platform, you can
> use
> >> >> MMapDirectory (where the operating system kernel manages the
> read/write
> >> >> between disk an memory). MMapDirectory is returned by default for
> >> >> FSDirectory.open() on most 64 bit platforms. The good thing: the
> >> "caching"
> >> >> space is outside your JVM heap, so does not slowdown the garbage
> >> collector.
> >> >> So be sure to *not* allocate too much heap space (-Xmx) to your
> search
> >> app,
> >> >> only the minimum needed to execute it and leave the rest of your RAM
> >> >> available for the OS kernel to manage FS cache.
> >> >>
> >> >> Uwe
> >> >>
> >> >> -
> >> >> Uwe Schindler
> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> >> http://www.thetaphi.de
> >> >> eMail: u...@thetaphi.de
> >> >>
> >> >>
> >> >> > -Original Message-
> >> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> >> > Sent: Sunday, February 05, 2012 7:56 AM
> >> >> > To: java-user@lucene.apache.org
> >> >> > Subject: Configure writer to write to FSDirectory?
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I build an RAMDirectory on a FSDirectory, and would like the writer
> >> >> associated
> >> >> > with the RAMDirectory to periodically write to hard drive.
> >> >> >
> >> >> > Is this achievable?
> >> >> >
> >> >> > Thanks.
> >> >>
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
That really helps! I will try it out.

Thanks.

On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:

> You would use NRTManagerReopenThread as a standalone thread, not
> plugged into your Executor stuff.  It is a utility class which you
> don't have to use.  See the javadocs.
>
> But in your case I'd use it, to start with anyway.  Fire it up with
> suitable settings and forget about it, except to call close()
> eventually. Once you've got things up and running you can tweak things
> as much as you want but you appear to be having trouble getting up and
> running.
>
> So ... somewhere in the initialisation code of your app, create an
> IndexWriter, NRTManager + ReopenThread and SearcherManager as outlined
> before.  Then pass the NRTManager to any/all write methods or threads
> and the SearcherManager instance to any/all search methods or threads
> and you're done.  If you want to use threads that are part of your
> ExecutorService, fine.  Just wrap it all together in whatever
> combination of Thread or Runnable instances you want.
>
>
> Does that help?
>
>
> --
> Ian.
>
>
> > I don't understand this following portion:
> >
> > IndexWriter iw = new IndexWriter(whatever - some standard disk index);
> > NRTManager nrtm = new NRTManager(iw, null);
> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> > ropt.setXxx(...);
> > 
> > ropt.start();
> >
> > I have a java ExecutorServices instance running which take care of my own
> > applications. I don't know how this NRTManagerReopenThread works with my
> > own ExecutorService instance.
> >
> > Can both work together? How can the NRTManagerReopenThread instance ropt
> be
> > plugged into my own multithreading framework?
> >
> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
> >
> >> If you can use NRTManager and SearcherManager things should be easy
> >> and blazingly fast rather than unbearably slow.  The latter phrase is
> >> not one often associated with lucene.
> >>
> >> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
> >> NRTManager nrtm = new NRTManager(iw, null);
> >> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> >> ropt.setXxx(...);
> >> ...
> >> ropt.start();
> >>
> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> >>
> >> Then add docs to your index via nrtm.addDocument(d), update with
> >> nrtm.updateDocument(...), and to search use
> >>
> >> IndexSearcher searcher = srchm.acquire();
> >> try {
> >>  search ...
> >> } finally {
> >>  srchm.release(searcher);
> >> }
> >>
> >> All thread safe so you don't have to worry about any complications
> >> there.  And I bet it'll be blindingly fast.
> >>
> >> Don't forget to close() things down at the end.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >>
> >> On Mon, Feb 6, 2012 at 12:15 AM, Cheng  wrote:
> >> > I was trying to, but don't know how to even I read some of your blogs.
> >> >
> >> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
> >> > luc...@mikemccandless.com> wrote:
> >> >
> >> >> Are you using near-real-time readers?
> >> >>
> >> >> (IndexReader.open(IndexWriter))
> >> >>
> >> >> Mike McCandless
> >> >>
> >> >> http://blog.mikemccandless.com
> >> >>
> >> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng 
> wrote:
> >> >> > Hi Uwe,
> >> >> >
> >> >> > My challenge is that I need to update/modify the indexes frequently
> >> while
> >> >> > providing the search capability. I was trying to use FSDirectory,
> but
> >> >> found
> >> >> > out that the reading and writing from/to FSDirectory is unbearably
> >> slow.
> >> >> So
> >> >> > I now am trying the RAMDirectory, which is fast.
> >> >> >
> >> >> > I don't know of  MMapDirectory, and wonder if it is as fast as
> >> >> RAMDirectory.
> >> >> >
> >> >> >
> >> >> > On Sun, Feb 5, 2012 at 4:14 PM, Uwe Schindler 
> >> wrote:
> >> >> >
> >> >> >> Hi Cheng,
> >> >> >>
> >> >> >> It seems tha

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Ian,

I encountered an issue that I need to frequently update the index. The
NRTManager seems not very helpful on this front as the speed is slower than
RAMDirectory is used.

Any improvement advice?



On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:

> That really helps! I will try it out.
>
> Thanks.
>
>
> On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
>
>> You would use NRTManagerReopenThread as a standalone thread, not
>> plugged into your Executor stuff.  It is a utility class which you
>> don't have to use.  See the javadocs.
>>
>> But in your case I'd use it, to start with anyway.  Fire it up with
>> suitable settings and forget about it, except to call close()
>> eventually. Once you've got things up and running you can tweak things
>> as much as you want but you appear to be having trouble getting up and
>> running.
>>
>> So ... somewhere in the initialisation code of your app, create an
>> IndexWriter, NRTManager + ReopenThread and SearcherManager as outlined
>> before.  Then pass the NRTManager to any/all write methods or threads
>> and the SearcherManager instance to any/all search methods or threads
>> and you're done.  If you want to use threads that are part of your
>> ExecutorService, fine.  Just wrap it all together in whatever
>> combination of Thread or Runnable instances you want.
>>
>>
>> Does that help?
>>
>>
>> --
>> Ian.
>>
>>
>> > I don't understand this following portion:
>> >
>> > IndexWriter iw = new IndexWriter(whatever - some standard disk index);
>> > NRTManager nrtm = new NRTManager(iw, null);
>> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
>> > ropt.setXxx(...);
>> > 
>> > ropt.start();
>> >
>> > I have a java ExecutorServices instance running which take care of my
>> own
>> > applications. I don't know how this NRTManagerReopenThread works with my
>> > own ExecutorService instance.
>> >
>> > Can both work together? How can the NRTManagerReopenThread instance
>> ropt be
>> > plugged into my own multithreading framework?
>> >
>> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
>> >
>> >> If you can use NRTManager and SearcherManager things should be easy
>> >> and blazingly fast rather than unbearably slow.  The latter phrase is
>> >> not one often associated with lucene.
>> >>
>> >> IndexWriter iw = new IndexWriter(whatever - some standard disk index);
>> >> NRTManager nrtm = new NRTManager(iw, null);
>> >> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
>> >> ropt.setXxx(...);
>> >> ...
>> >> ropt.start();
>> >>
>> >> SearcherManager srchm = nrtm.getSearcherManager(b);
>> >>
>> >> Then add docs to your index via nrtm.addDocument(d), update with
>> >> nrtm.updateDocument(...), and to search use
>> >>
>> >> IndexSearcher searcher = srchm.acquire();
>> >> try {
>> >>  search ...
>> >> } finally {
>> >>  srchm.release(searcher);
>> >> }
>> >>
>> >> All thread safe so you don't have to worry about any complications
>> >> there.  And I bet it'll be blindingly fast.
>> >>
>> >> Don't forget to close() things down at the end.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >>
>> >> On Mon, Feb 6, 2012 at 12:15 AM, Cheng 
>> wrote:
>> >> > I was trying to, but don't know how to even I read some of your
>> blogs.
>> >> >
>> >> > On Sun, Feb 5, 2012 at 10:22 PM, Michael McCandless <
>> >> > luc...@mikemccandless.com> wrote:
>> >> >
>> >> >> Are you using near-real-time readers?
>> >> >>
>> >> >> (IndexReader.open(IndexWriter))
>> >> >>
>> >> >> Mike McCandless
>> >> >>
>> >> >> http://blog.mikemccandless.com
>> >> >>
>> >> >> On Sun, Feb 5, 2012 at 9:03 AM, Cheng 
>> wrote:
>> >> >> > Hi Uwe,
>> >> >> >
>> >> >> > My challenge is that I need to update/modify the indexes
>> frequently
>> >> while
>> >> >> > providing the search capability. I was trying to u

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Uwe, when I meant speed is slow, I didn't refer to instant visibility of
changes, but that the changes may be synchronized with FSDirectory when I
use writer.commit().

When I use RAMDirectory, the writer.commit() seems much faster than using
NRTManager built upon FSDirectory. So, I am guessing the difference is the
index synchronization.



On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:

> Please review the following articles about NRT, absolutely instant updates
> that are visible as they are done are almost impossible (even with
> RAMDirectory):
>
> http://goo.gl/mzAHt
> http://goo.gl/5RoPx
> http://goo.gl/vSJ7x
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Monday, February 06, 2012 4:27 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Configure writer to write to FSDirectory?
> >
> > Ian,
> >
> > I encountered an issue that I need to frequently update the index. The
> > NRTManager seems not very helpful on this front as the speed is slower
> than
> > RAMDirectory is used.
> >
> > Any improvement advice?
> >
> >
> >
> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:
> >
> > > That really helps! I will try it out.
> > >
> > > Thanks.
> > >
> > >
> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> > >
> > >> You would use NRTManagerReopenThread as a standalone thread, not
> > >> plugged into your Executor stuff.  It is a utility class which you
> > >> don't have to use.  See the javadocs.
> > >>
> > >> But in your case I'd use it, to start with anyway.  Fire it up with
> > >> suitable settings and forget about it, except to call close()
> > >> eventually. Once you've got things up and running you can tweak
> > >> things as much as you want but you appear to be having trouble
> > >> getting up and running.
> > >>
> > >> So ... somewhere in the initialisation code of your app, create an
> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
> > >> outlined before.  Then pass the NRTManager to any/all write methods
> > >> or threads and the SearcherManager instance to any/all search methods
> > >> or threads and you're done.  If you want to use threads that are part
> > >> of your ExecutorService, fine.  Just wrap it all together in whatever
> > >> combination of Thread or Runnable instances you want.
> > >>
> > >>
> > >> Does that help?
> > >>
> > >>
> > >> --
> > >> Ian.
> > >>
> > >>
> > >> > I don't understand this following portion:
> > >> >
> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
> > >> > ...); ropt.setXxx(...); 
> > >> > ropt.start();
> > >> >
> > >> > I have a java ExecutorServices instance running which take care of
> > >> > my
> > >> own
> > >> > applications. I don't know how this NRTManagerReopenThread works
> > >> > with my own ExecutorService instance.
> > >> >
> > >> > Can both work together? How can the NRTManagerReopenThread
> > instance
> > >> ropt be
> > >> > plugged into my own multithreading framework?
> > >> >
> > >> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
> > >> >
> > >> >> If you can use NRTManager and SearcherManager things should be
> > >> >> easy and blazingly fast rather than unbearably slow.  The latter
> > >> >> phrase is not one often associated with lucene.
> > >> >>
> > >> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> > >> >> index); NRTManager nrtm = new NRTManager(iw, null);
> > >> >> NRTManagerReopenThread ropt = new
> > NRTManagerReopenThread(nrtm,
> > >> >> ...); ropt.setXxx(...); ...
> > >> >> ropt.start();
> > >> >>
> > >> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> > >> >>
> > &g

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
I meant that when I use NRTManager and use commit(), the speed is slower
than when I use RAMDirectory.

In my case, NRTManager instance not only perform search but update/modify
indexes which should be visible to other threads. In RAMDirectory, the
commit() doesn't synchronize indexes with the FSDirectory. The slower speed
of using NRTManager built upon FSDirectory may be caused by the frequent
updates or modification of indexes.

That is my guess.

On Mon, Feb 6, 2012 at 11:41 PM, Ian Lea  wrote:

> What exactly do you mean by the "speed is slower"?  Time taken to
> update the index?  Time taken for updates to become visible in search
> results?  Time taken for searches to run on the IndexSearcher returned
> from SearcherManager?  Something else?
>
>
> --
> Ian.
>
>
> On Mon, Feb 6, 2012 at 3:27 PM, Cheng  wrote:
> > Ian,
> >
> > I encountered an issue that I need to frequently update the index. The
> > NRTManager seems not very helpful on this front as the speed is slower
> than
> > RAMDirectory is used.
> >
> > Any improvement advice?
> >
> >
> >
> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng  wrote:
> >
> >> That really helps! I will try it out.
> >>
> >> Thanks.
> >>
> >>
> >> On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> >>
> >>> You would use NRTManagerReopenThread as a standalone thread, not
> >>> plugged into your Executor stuff.  It is a utility class which you
> >>> don't have to use.  See the javadocs.
> >>>
> >>> But in your case I'd use it, to start with anyway.  Fire it up with
> >>> suitable settings and forget about it, except to call close()
> >>> eventually. Once you've got things up and running you can tweak things
> >>> as much as you want but you appear to be having trouble getting up and
> >>> running.
> >>>
> >>> So ... somewhere in the initialisation code of your app, create an
> >>> IndexWriter, NRTManager + ReopenThread and SearcherManager as outlined
> >>> before.  Then pass the NRTManager to any/all write methods or threads
> >>> and the SearcherManager instance to any/all search methods or threads
> >>> and you're done.  If you want to use threads that are part of your
> >>> ExecutorService, fine.  Just wrap it all together in whatever
> >>> combination of Thread or Runnable instances you want.
> >>>
> >>>
> >>> Does that help?
> >>>
> >>>
> >>> --
> >>> Ian.
> >>>
> >>>
> >>> > I don't understand this following portion:
> >>> >
> >>> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> index);
> >>> > NRTManager nrtm = new NRTManager(iw, null);
> >>> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> >>> > ropt.setXxx(...);
> >>> > 
> >>> > ropt.start();
> >>> >
> >>> > I have a java ExecutorServices instance running which take care of my
> >>> own
> >>> > applications. I don't know how this NRTManagerReopenThread works
> with my
> >>> > own ExecutorService instance.
> >>> >
> >>> > Can both work together? How can the NRTManagerReopenThread instance
> >>> ropt be
> >>> > plugged into my own multithreading framework?
> >>> >
> >>> > On Mon, Feb 6, 2012 at 8:17 PM, Ian Lea  wrote:
> >>> >
> >>> >> If you can use NRTManager and SearcherManager things should be easy
> >>> >> and blazingly fast rather than unbearably slow.  The latter phrase
> is
> >>> >> not one often associated with lucene.
> >>> >>
> >>> >> IndexWriter iw = new IndexWriter(whatever - some standard disk
> index);
> >>> >> NRTManager nrtm = new NRTManager(iw, null);
> >>> >> NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...);
> >>> >> ropt.setXxx(...);
> >>> >> ...
> >>> >> ropt.start();
> >>> >>
> >>> >> SearcherManager srchm = nrtm.getSearcherManager(b);
> >>> >>
> >>> >> Then add docs to your index via nrtm.addDocument(d), update with
> >>> >> nrtm.updateDocument(...), and to search use
> >>> >>
> >>> >> IndexSearcher searcher = srchm

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
My original question is if there exists a way to configure writer when to
writer to FSDirectory. I think there may be something in
the IndexWriterConfig that can helps.

On Mon, Feb 6, 2012 at 11:50 PM, Ian Lea  wrote:

> Well, yes.  What would you expect?  From the javadocs for
> IndexWriter.commit()
>
> Commits all pending changes (added & deleted documents, segment
> merges, added indexes, etc.) to the index, and syncs all referenced
> index files ... This may be a costly operation, so you should test the
> cost in your application and do it only when really necessary.
>
> If you are using NRTManager why do you care how long this takes?  How
> often are you calling it?  Why?
>
>
> --
> Ian.
>
>
> On Mon, Feb 6, 2012 at 3:45 PM, Cheng  wrote:
> > Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> > changes, but that the changes may be synchronized with FSDirectory when I
> > use writer.commit().
> >
> > When I use RAMDirectory, the writer.commit() seems much faster than using
> > NRTManager built upon FSDirectory. So, I am guessing the difference is
> the
> > index synchronization.
> >
> >
> >
> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
> >
> >> Please review the following articles about NRT, absolutely instant
> updates
> >> that are visible as they are done are almost impossible (even with
> >> RAMDirectory):
> >>
> >> http://goo.gl/mzAHt
> >> http://goo.gl/5RoPx
> >> http://goo.gl/vSJ7x
> >>
> >> Uwe
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >> > -Original Message-
> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> > Sent: Monday, February 06, 2012 4:27 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Re: Configure writer to write to FSDirectory?
> >> >
> >> > Ian,
> >> >
> >> > I encountered an issue that I need to frequently update the index. The
> >> > NRTManager seems not very helpful on this front as the speed is slower
> >> than
> >> > RAMDirectory is used.
> >> >
> >> > Any improvement advice?
> >> >
> >> >
> >> >
> >> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> wrote:
> >> >
> >> > > That really helps! I will try it out.
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> >> > >
> >> > >> You would use NRTManagerReopenThread as a standalone thread, not
> >> > >> plugged into your Executor stuff.  It is a utility class which you
> >> > >> don't have to use.  See the javadocs.
> >> > >>
> >> > >> But in your case I'd use it, to start with anyway.  Fire it up with
> >> > >> suitable settings and forget about it, except to call close()
> >> > >> eventually. Once you've got things up and running you can tweak
> >> > >> things as much as you want but you appear to be having trouble
> >> > >> getting up and running.
> >> > >>
> >> > >> So ... somewhere in the initialisation code of your app, create an
> >> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
> >> > >> outlined before.  Then pass the NRTManager to any/all write methods
> >> > >> or threads and the SearcherManager instance to any/all search
> methods
> >> > >> or threads and you're done.  If you want to use threads that are
> part
> >> > >> of your ExecutorService, fine.  Just wrap it all together in
> whatever
> >> > >> combination of Thread or Runnable instances you want.
> >> > >>
> >> > >>
> >> > >> Does that help?
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Ian.
> >> > >>
> >> > >>
> >> > >> > I don't understand this following portion:
> >> > >> >
> >> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> >> > >> > NRTManagerReo

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Agree.

On Mon, Feb 6, 2012 at 11:53 PM, Uwe Schindler  wrote:

> Hi Cheng,
>
> all pros and cons are explained in those articles written by Mike! As soon
> as there are harddisks in the game, there is a slowdown, what do you
> expect?
> If you need it faster, buy SSDs! :-)
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Monday, February 06, 2012 4:45 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Configure writer to write to FSDirectory?
> >
> > Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> changes,
> > but that the changes may be synchronized with FSDirectory when I use
> > writer.commit().
> >
> > When I use RAMDirectory, the writer.commit() seems much faster than using
> > NRTManager built upon FSDirectory. So, I am guessing the difference is
> the
> > index synchronization.
> >
> >
> >
> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
> >
> > > Please review the following articles about NRT, absolutely instant
> > > updates that are visible as they are done are almost impossible (even
> > > with
> > > RAMDirectory):
> > >
> > > http://goo.gl/mzAHt
> > > http://goo.gl/5RoPx
> > > http://goo.gl/vSJ7x
> > >
> > > Uwe
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: u...@thetaphi.de
> > >
> > > > -Original Message-
> > > > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > > > Sent: Monday, February 06, 2012 4:27 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: Configure writer to write to FSDirectory?
> > > >
> > > > Ian,
> > > >
> > > > I encountered an issue that I need to frequently update the index.
> > > > The NRTManager seems not very helpful on this front as the speed is
> > > > slower
> > > than
> > > > RAMDirectory is used.
> > > >
> > > > Any improvement advice?
> > > >
> > > >
> > > >
> > > > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> > wrote:
> > > >
> > > > > That really helps! I will try it out.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea 
> wrote:
> > > > >
> > > > >> You would use NRTManagerReopenThread as a standalone thread, not
> > > > >> plugged into your Executor stuff.  It is a utility class which
> > > > >> you don't have to use.  See the javadocs.
> > > > >>
> > > > >> But in your case I'd use it, to start with anyway.  Fire it up
> > > > >> with suitable settings and forget about it, except to call
> > > > >> close() eventually. Once you've got things up and running you can
> > > > >> tweak things as much as you want but you appear to be having
> > > > >> trouble getting up and running.
> > > > >>
> > > > >> So ... somewhere in the initialisation code of your app, create
> > > > >> an IndexWriter, NRTManager + ReopenThread and SearcherManager as
> > > > >> outlined before.  Then pass the NRTManager to any/all write
> > > > >> methods or threads and the SearcherManager instance to any/all
> > > > >> search methods or threads and you're done.  If you want to use
> > > > >> threads that are part of your ExecutorService, fine.  Just wrap
> > > > >> it all together in whatever combination of Thread or Runnable
> instances
> > you want.
> > > > >>
> > > > >>
> > > > >> Does that help?
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Ian.
> > > > >>
> > > > >>
> > > > >> > I don't understand this following portion:
> > > > >> >
> > > > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> > > > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> > > > &

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Good point. I should remove the commits.

Any difference between NRTCashingDirectory and RAMDirectory? how to define
the "small"?

On Tue, Feb 7, 2012 at 12:42 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> You shouldn't call IW.commit when using NRT; that's the point of NRT
> (making changes visible w/o calling commit).
>
> Only call commit when you require that all changes be durable (surive
> OS / JVM crash, power loss, etc.) on disk.
>
> Also, you can use NRTCachingDirectory which acts like RAMDirectory for
> small flushed segments.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Feb 6, 2012 at 10:45 AM, Cheng  wrote:
> > Uwe, when I meant speed is slow, I didn't refer to instant visibility of
> > changes, but that the changes may be synchronized with FSDirectory when I
> > use writer.commit().
> >
> > When I use RAMDirectory, the writer.commit() seems much faster than using
> > NRTManager built upon FSDirectory. So, I am guessing the difference is
> the
> > index synchronization.
> >
> >
> >
> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler  wrote:
> >
> >> Please review the following articles about NRT, absolutely instant
> updates
> >> that are visible as they are done are almost impossible (even with
> >> RAMDirectory):
> >>
> >> http://goo.gl/mzAHt
> >> http://goo.gl/5RoPx
> >> http://goo.gl/vSJ7x
> >>
> >> Uwe
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >> > -Original Message-
> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> > Sent: Monday, February 06, 2012 4:27 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Re: Configure writer to write to FSDirectory?
> >> >
> >> > Ian,
> >> >
> >> > I encountered an issue that I need to frequently update the index. The
> >> > NRTManager seems not very helpful on this front as the speed is slower
> >> than
> >> > RAMDirectory is used.
> >> >
> >> > Any improvement advice?
> >> >
> >> >
> >> >
> >> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> wrote:
> >> >
> >> > > That really helps! I will try it out.
> >> > >
> >> > > Thanks.
> >> > >
> >> > >
> >> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea  wrote:
> >> > >
> >> > >> You would use NRTManagerReopenThread as a standalone thread, not
> >> > >> plugged into your Executor stuff.  It is a utility class which you
> >> > >> don't have to use.  See the javadocs.
> >> > >>
> >> > >> But in your case I'd use it, to start with anyway.  Fire it up with
> >> > >> suitable settings and forget about it, except to call close()
> >> > >> eventually. Once you've got things up and running you can tweak
> >> > >> things as much as you want but you appear to be having trouble
> >> > >> getting up and running.
> >> > >>
> >> > >> So ... somewhere in the initialisation code of your app, create an
> >> > >> IndexWriter, NRTManager + ReopenThread and SearcherManager as
> >> > >> outlined before.  Then pass the NRTManager to any/all write methods
> >> > >> or threads and the SearcherManager instance to any/all search
> methods
> >> > >> or threads and you're done.  If you want to use threads that are
> part
> >> > >> of your ExecutorService, fine.  Just wrap it all together in
> whatever
> >> > >> combination of Thread or Runnable instances you want.
> >> > >>
> >> > >>
> >> > >> Does that help?
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Ian.
> >> > >>
> >> > >>
> >> > >> > I don't understand this following portion:
> >> > >> >
> >> > >> > IndexWriter iw = new IndexWriter(whatever - some standard disk
> >> > >> > index); NRTManager nrtm = new NRTManager(iw, null);
> >> > >> > NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm,
> >> > >&g

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Will do.

On Tue, Feb 7, 2012 at 12:52 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> You tell NRTCachingDirectory how much RAM it's allowed to use, and it
> then caches newly flushed segments in a private RAMDirectory.
>
> But you should first test performance w/o it (after removing the
> commit calls).  NRT is very fast...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Feb 6, 2012 at 11:46 AM, Cheng  wrote:
> > Good point. I should remove the commits.
> >
> > Any difference between NRTCashingDirectory and RAMDirectory? how to
> define
> > the "small"?
> >
> > On Tue, Feb 7, 2012 at 12:42 AM, Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> You shouldn't call IW.commit when using NRT; that's the point of NRT
> >> (making changes visible w/o calling commit).
> >>
> >> Only call commit when you require that all changes be durable (surive
> >> OS / JVM crash, power loss, etc.) on disk.
> >>
> >> Also, you can use NRTCachingDirectory which acts like RAMDirectory for
> >> small flushed segments.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >> On Mon, Feb 6, 2012 at 10:45 AM, Cheng  wrote:
> >> > Uwe, when I meant speed is slow, I didn't refer to instant visibility
> of
> >> > changes, but that the changes may be synchronized with FSDirectory
> when I
> >> > use writer.commit().
> >> >
> >> > When I use RAMDirectory, the writer.commit() seems much faster than
> using
> >> > NRTManager built upon FSDirectory. So, I am guessing the difference is
> >> the
> >> > index synchronization.
> >> >
> >> >
> >> >
> >> > On Mon, Feb 6, 2012 at 11:40 PM, Uwe Schindler 
> wrote:
> >> >
> >> >> Please review the following articles about NRT, absolutely instant
> >> updates
> >> >> that are visible as they are done are almost impossible (even with
> >> >> RAMDirectory):
> >> >>
> >> >> http://goo.gl/mzAHt
> >> >> http://goo.gl/5RoPx
> >> >> http://goo.gl/vSJ7x
> >> >>
> >> >> Uwe
> >> >>
> >> >> -
> >> >> Uwe Schindler
> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> >> http://www.thetaphi.de
> >> >> eMail: u...@thetaphi.de
> >> >>
> >> >> > -Original Message-
> >> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> >> > Sent: Monday, February 06, 2012 4:27 PM
> >> >> > To: java-user@lucene.apache.org
> >> >> > Subject: Re: Configure writer to write to FSDirectory?
> >> >> >
> >> >> > Ian,
> >> >> >
> >> >> > I encountered an issue that I need to frequently update the index.
> The
> >> >> > NRTManager seems not very helpful on this front as the speed is
> slower
> >> >> than
> >> >> > RAMDirectory is used.
> >> >> >
> >> >> > Any improvement advice?
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Feb 6, 2012 at 10:24 PM, Cheng 
> >> wrote:
> >> >> >
> >> >> > > That really helps! I will try it out.
> >> >> > >
> >> >> > > Thanks.
> >> >> > >
> >> >> > >
> >> >> > > On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea 
> wrote:
> >> >> > >
> >> >> > >> You would use NRTManagerReopenThread as a standalone thread, not
> >> >> > >> plugged into your Executor stuff.  It is a utility class which
> you
> >> >> > >> don't have to use.  See the javadocs.
> >> >> > >>
> >> >> > >> But in your case I'd use it, to start with anyway.  Fire it up
> with
> >> >> > >> suitable settings and forget about it, except to call close()
> >> >> > >> eventually. Once you've got things up and running you can tweak
> >> >> > >> things as much as you want but you appear to be having trouble
> >> >> > >> getting up and running.
> >> >> > >>
> >>

Re: NRTManager and AlreadyClosedException

2012-02-08 Thread Cheng
You are right. There is a method by which I do searching. At the end of the
method, I release the index searcher (not the searchermanager).

Since this method is called by multiple threads. So I think the index
searcher will be released multiple times.

First, I wonder if releasing searcher is same as releasing the searcher
manager.

Second, as said in Mike's blog, the searcher should be released, which has
seemingly caused the problem. What are my alternatives here to avoid it?

Thanks



On Wed, Feb 8, 2012 at 7:51 PM, Ian Lea  wrote:

> Are you closing the SearcherManager?  Calling release() multiple times?
>
> From the exception message the first sounds most likely.
>
>
> --
> Ian.
>
>
> On Wed, Feb 8, 2012 at 5:20 AM, Cheng  wrote:
> > Hi,
> >
> > I am using NRTManager and NRTManagerReopenThread. Though I don't close
> > either writer or the reopen thread, I receive AlreadyClosedException as
> > follow.
> >
> > My initiating NRTManager and NRTManagerReopenThread are:
> >
> > FSDirectory indexDir = new NIOFSDirectory(new File(
> > indexFolder));
> >
> > IndexWriterConfig iwConfig = new IndexWriterConfig(
> > version, new LimitTokenCountAnalyzer(
> > StandardAnalyzer, maxTokenNum));
> >
> > iw = new IndexWriter(indexDir, iwConfig);
> >
> > nrtm = new NRTManager(iw, null);
> >
> > ropt = new NRTManagerReopenThread(nrtm,
> > targetMaxStaleSec,
> > targetMinStaleSec);
> >
> > ropt.setName("Reopen Thread");
> > ropt.setPriority(Math.min(Thread.currentThread().getPriority() + 2,
> > Thread.MAX_PRIORITY));
> > ropt.setDaemon(true);
> > ropt.start();
> >
> >
> > Where may the searchermanager fall out?
> >
> >
> >
> > org.apache.lucene.store.AlreadyClosedException: this SearcherManager is
> > closed77
> > at
> >
> org.apache.lucene.search.SearcherManager.acquire(SearcherManager.java:235)
> > at
> com.yyt.core.er.lucene.YYTLuceneImpl.codeIndexed(YYTLuceneImpl.java:138)
> > at com.yyt.core.er.main.copy.SingleCodeER.run(SingleCodeER.java:50)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: NRTManager and AlreadyClosedException

2012-02-08 Thread Cheng
I use it exactly the same way. So there must be other reason causing the
problem.

On Wed, Feb 8, 2012 at 8:21 PM, Ian Lea  wrote:

> Releasing a searcher is not the same as closing the searcher manager,
> if that is what you mean.
>
> The searcher should indeed be released, but once only for each
> acquire().  Your searching threads should have code like that shown in
> the SearcherManager javadocs.
>
> IndexSearcher s = manager.acquire();
>  try {
>   // Do searching, doc retrieval, etc. with s
>  } finally {
>   manager.release(s);
>  }
>  // Do not use s after this!
>  s = null;
>
> --
> Ian.
>
>
> On Wed, Feb 8, 2012 at 12:09 PM, Cheng  wrote:
> > You are right. There is a method by which I do searching. At the end of
> the
> > method, I release the index searcher (not the searchermanager).
> >
> > Since this method is called by multiple threads. So I think the index
> > searcher will be released multiple times.
> >
> > First, I wonder if releasing searcher is same as releasing the searcher
> > manager.
> >
> > Second, as said in Mike's blog, the searcher should be released, which
> has
> > seemingly caused the problem. What are my alternatives here to avoid it?
> >
> > Thanks
> >
> >
> >
> > On Wed, Feb 8, 2012 at 7:51 PM, Ian Lea  wrote:
> >
> >> Are you closing the SearcherManager?  Calling release() multiple times?
> >>
> >> From the exception message the first sounds most likely.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Wed, Feb 8, 2012 at 5:20 AM, Cheng  wrote:
> >> > Hi,
> >> >
> >> > I am using NRTManager and NRTManagerReopenThread. Though I don't close
> >> > either writer or the reopen thread, I receive AlreadyClosedException
> as
> >> > follow.
> >> >
> >> > My initiating NRTManager and NRTManagerReopenThread are:
> >> >
> >> > FSDirectory indexDir = new NIOFSDirectory(new File(
> >> > indexFolder));
> >> >
> >> > IndexWriterConfig iwConfig = new IndexWriterConfig(
> >> > version, new LimitTokenCountAnalyzer(
> >> > StandardAnalyzer, maxTokenNum));
> >> >
> >> > iw = new IndexWriter(indexDir, iwConfig);
> >> >
> >> > nrtm = new NRTManager(iw, null);
> >> >
> >> > ropt = new NRTManagerReopenThread(nrtm,
> >> > targetMaxStaleSec,
> >> > targetMinStaleSec);
> >> >
> >> > ropt.setName("Reopen Thread");
> >> > ropt.setPriority(Math.min(Thread.currentThread().getPriority() + 2,
> >> > Thread.MAX_PRIORITY));
> >> > ropt.setDaemon(true);
> >> > ropt.start();
> >> >
> >> >
> >> > Where may the searchermanager fall out?
> >> >
> >> >
> >> >
> >> > org.apache.lucene.store.AlreadyClosedException: this SearcherManager
> is
> >> > closed77
> >> > at
> >> >
> >>
> org.apache.lucene.search.SearcherManager.acquire(SearcherManager.java:235)
> >> > at
> >> com.yyt.core.er.lucene.YYTLuceneImpl.codeIndexed(YYTLuceneImpl.java:138)
> >> > at com.yyt.core.er.main.copy.SingleCodeER.run(SingleCodeER.java:50)
> >> > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> >> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: slow speed of searching

2012-02-08 Thread Cheng
thanks a lot

On Wed, Feb 8, 2012 at 9:48 PM, Ian Lea  wrote:

> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> (the 3rd item is Use a local filesystem!)
>
> --
> Ian.
>
>
> On Wed, Feb 8, 2012 at 12:44 PM, Cheng  wrote:
> > Hi,
> >
> > I have about 6.5 million documents which lead to 1.5G index. The speed of
> > search a couple terms, like "dvd" and "price", causes about 0.1 second.
> >
> > I am afraid that our data will grow rapidly. Except for dividing
> documents
> > into multiple indexes, what are the solutions I can try to improve
> > searching spead?
> >
> > Thanks
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: When to refresh writer?

2012-02-14 Thread Cheng
thanks

On Tue, Feb 14, 2012 at 2:22 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> IndexWriter doesn't require refreshing... just keep it open forever.
> It'll run it's own merges when needed (see the MergePolicy/Scheduler).
>
> Just call .commit() when you want changes to be durable (survive
> OS/JVM crash, power loss, etc.).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Feb 13, 2012 at 1:17 PM, Cheng  wrote:
> > Hi,
> >
> > My application will go on for ever. When is good time to refresh the
> writer
> > (and merge the segments)?
> >
> > Thanks
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Can I use multiple writers of different applications on a same FSDirectory?

2012-02-14 Thread Cheng
thanks

On Wed, Feb 15, 2012 at 1:14 AM, Mihai Caraman wrote:

> you can use a fsdirectory for each writer and then, search on all of them
> at once. This is the recomended way if you have different apps.
>
> În data de 14 februarie 2012, 19:06, Ian Lea  a scris:
>
> > You can only have one writer against one index at a time.  Lucene's
> > locking will prevent anything else.
> >
> >
> > --
> > Ian.
> >
> >
> > On Tue, Feb 14, 2012 at 4:49 PM, Cheng  wrote:
> > > Hi,
> > >
> > > I need to manage multiple applications, each having its own writer yet
> > on a
> > > same FSdirectory. How to make it happen while I encounter quite a few
> > > exceptions?
> > >
> > > thanks
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


Re: How to separate one index into multiple?

2012-02-20 Thread Cheng
great idea!

On Sun, Feb 19, 2012 at 9:43 PM, Li Li  wrote:

> you can delete by query like -category:category1
>
> On Sun, Feb 19, 2012 at 9:41 PM, Li Li  wrote:
>
> > I think you could do as follows.  taking splitting it to 3 indexes for
> > example.
> > you can copy the index 3 times.
> > for copy 1
> >   for(int i=0;i >   reader1.delete(i);
> >   }
> > for copy
> >   for(int i=1;i >   reader2.delete(i);
> >  }
> > 
> >  and then optimize these 3 indexes
> >
>


TaxonomySearch & similar words?

2012-02-22 Thread Cheng
Hi,

I am using Taxonomy Search to build a facet comprising things such as
“/author/American/Mark Twain”.

Since the word "author" has a synonym of "writer", can I use "writer"
instead of "author" to get the path?

Currently I can only use exactly the word "author" to do it.

Thanks


Re: TaxonomySearch & similar words?

2012-02-22 Thread Cheng
Thank you. The alternative sounds reasonable.

On Thu, Feb 23, 2012 at 12:54 PM, Shai Erera  wrote:

> Hi Cheng,
>
> You will need to use the exact path labels in order to get to the category
> 'Mark Twain', unless you index multiple paths from start, e.g.:
> /author/American/Mark Twain
> /writer/American/Mart Twain
>
> The taxonomy index does not process the CategoryPath labels in anyway to
> e.g. produce synonyms, but rather, keeps them as-is.
>
> Another alternative is to write a simple module that at runtime will
> replace author/writer by one value that you choose to store in the
> taxonomy, an so your users will be able to count /author or /writer
> interchangeably.
>
> I prefer the second approach because it keeps the taxonomy small, which is
> preferred at runtime (when facet counts are required).
>
> Shai
>
> On Thu, Feb 23, 2012 at 6:48 AM, Cheng  wrote:
>
> > Hi,
> >
> > I am using Taxonomy Search to build a facet comprising things such as
> > “/author/American/Mark Twain”.
> >
> > Since the word "author" has a synonym of "writer", can I use "writer"
> > instead of "author" to get the path?
> >
> > Currently I can only use exactly the word "author" to do it.
> >
> > Thanks
> >
>


Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Cheng
My indexes are 500MB+. So it seems like that RAMDirectory is not good for
that big a size.

My challenge, on the other side, is that I need to update the indexes very
frequently. So, do you think  MMapDirectory is the solution?

Thanks.

On Mon, Jun 4, 2012 at 10:30 PM, Jack Krupansky wrote:

> From the javadoc for RAMDirectory:
>
> "Warning: This class is not intended to work with huge indexes. Everything
> beyond several hundred megabytes will waste resources (GC cycles), because
> it uses an internal buffer size of 1024 bytes, producing millions of
> byte[1024] arrays. This class is optimized for small memory-resident
> indexes. It also has bad concurrency on multithreaded environments.
>
> It is recommended to materialize large indexes on disk and use
> MMapDirectory, which is a high-performance directory implementation working
> directly on the file system cache of the operating system, so copying data
> to Java heap space is not useful."
>
> -- Jack Krupansky
>
> -Original Message- From: Cheng
> Sent: Monday, June 04, 2012 10:08 AM
> To: java-user@lucene.apache.org
> Subject: RAMDirectory unexpectedly slows
>
>
> Hi,
>
> My apps need to read from and write to some big indexes frequently. So I
> use RAMDirectory instead of FSDirectory, and give JVM about 2GB memory
> size.
>
> I notice that the speed of reading and writing unexpectedly slows as the
> size of the indexes increases. Since the usage of RAM is less than 20%, I
> think by default the RAMDirectory doesn't take advantage of the memory I
> assigned to JVM.
>
> What are the steps to improve the reading and writing speed of
> RAMDirectory?
>
> Thanks!
> Jeff
>
> --**--**-
> To unsubscribe, e-mail: 
> java-user-unsubscribe@lucene.**apache.org
> For additional commands, e-mail: 
> java-user-help@lucene.apache.**org
>
>


Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Cheng
Please shed more insight into the difference between JVM heap size and the
memory size used by Lucene.

What I am getting at is that no matter however much ram I give my apps,
Lucene can't utilize it. Is that right?

What about the ByteBufferDirectory? Can this specific directory utilize the
2GB memory I grant to the app?

On Mon, Jun 4, 2012 at 10:58 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:

> If you want the index to be stored completely in RAM, there is the
> ByteBuffer directory [1].  Though I do not see the point in putting an
> index in RAM, it will be cached in RAM regardless in the OS system IO
> cache.
>
> 1.
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/apache/lucene/store/bytebuffer/ByteBufferDirectory.java
>
> On Mon, Jun 4, 2012 at 10:55 AM, Cheng  wrote:
> > My indexes are 500MB+. So it seems like that RAMDirectory is not good for
> > that big a size.
> >
> > My challenge, on the other side, is that I need to update the indexes
> very
> > frequently. So, do you think  MMapDirectory is the solution?
> >
> > Thanks.
> >
> > On Mon, Jun 4, 2012 at 10:30 PM, Jack Krupansky  >wrote:
> >
> >> From the javadoc for RAMDirectory:
> >>
> >> "Warning: This class is not intended to work with huge indexes.
> Everything
> >> beyond several hundred megabytes will waste resources (GC cycles),
> because
> >> it uses an internal buffer size of 1024 bytes, producing millions of
> >> byte[1024] arrays. This class is optimized for small memory-resident
> >> indexes. It also has bad concurrency on multithreaded environments.
> >>
> >> It is recommended to materialize large indexes on disk and use
> >> MMapDirectory, which is a high-performance directory implementation
> working
> >> directly on the file system cache of the operating system, so copying
> data
> >> to Java heap space is not useful."
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Cheng
> >> Sent: Monday, June 04, 2012 10:08 AM
> >> To: java-user@lucene.apache.org
> >> Subject: RAMDirectory unexpectedly slows
> >>
> >>
> >> Hi,
> >>
> >> My apps need to read from and write to some big indexes frequently. So I
> >> use RAMDirectory instead of FSDirectory, and give JVM about 2GB memory
> >> size.
> >>
> >> I notice that the speed of reading and writing unexpectedly slows as the
> >> size of the indexes increases. Since the usage of RAM is less than 20%,
> I
> >> think by default the RAMDirectory doesn't take advantage of the memory I
> >> assigned to JVM.
> >>
> >> What are the steps to improve the reading and writing speed of
> >> RAMDirectory?
> >>
> >> Thanks!
> >> Jeff
> >>
> >>
> --**--**-
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<
> java-user-unsubscr...@lucene.apache.org>
> >> For additional commands, e-mail: java-user-help@lucene.apache.**org<
> java-user-h...@lucene.apache.org>
> >>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Cheng
Can I control the size of ram given to either MMapDirectory or
ByteBufferDirectory?

On Mon, Jun 4, 2012 at 11:42 PM, Uwe Schindler  wrote:

> Hi,
>
> If you are using MMapDirectory or this ByteBufferDirectory (which is
> similar to the first) the used RAM is outside JVM heap, it is in the FS
> cache of the OS kernel. Giving too much memory to the JVM penalizes the OS
> cache, so give only as much as the App needs. Lucene and the OS kernel will
> then utilize the remaining memory for caching.
>
> Please read docs of MMapDirectory and inform yourself about mmap in e.g.
> Wikipedia.
>
> Uwe
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de
>
>
>
> Cheng  schrieb:
>
> Please shed more insight into the difference between JVM heap size and the
> memory size used by Lucene.
>
> What I am getting at is that no matter however much ram I give my apps,
> Lucene can't utilize it. Is that right?
>
> What about the ByteBufferDirectory? Can this specific directory utilize the
> 2GB memory I grant to the app?
>
> On Mon, Jun 4, 2012 at 10:58 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
> > If you want the index to be stored completely in RAM, there is the
> > ByteBuffer directory [1]. Though I do not see the point in putting an
> > index in RAM, it will be cached in RAM regardless in the OS system IO
> > cache.
> >
> > 1.
> >
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/apache/lucene/store/bytebuffer/ByteBufferDirectory.java
> >
> > On Mon, Jun 4, 2012 at 10:55 AM, Cheng  wrote:
> > > My indexes are 500MB+. So it seems like that RAMDirectory is not good
> for
> > > that big a size.
> > >
> > > My challenge, on the other side, is that I need to update the indexes
> > very
> > > frequently. So, do you think MMapDirectory is the solution?
> > >
> > > Thanks.
> > >
> > > On Mon, Jun 4, 2012 at 10:30 PM, Jack Krupansky <
> j...@basetechnology.com
> > >wrote:
> > >
> > >> From the javadoc for RAMDirectory:
> > >>
> > >> "Warning: This class is not intended to work with huge indexes.
> > Everything
> > >> beyond several hundred megabytes will waste resources (GC cycles),
> > because
> > >> it uses an internal buffer size of 1024 bytes, producing millions of
> > >> byte[1024] arrays. This class is optimized for small memory-resident
> > >> indexes. It also has bad concurrency on multithreaded environments.
> > >>
> > >> It is recommended to materialize large indexes on disk and use
> > >> MMapDirectory, which is a high-performance directory implementation
> > working
> > >> directly on the file system cache of the operating system, so copying
> > data
> > >> to Java heap space is not useful."
> > >>
> > >> -- Jack Krupansky
> > >>
> > >> -Original Message- From: Cheng
> > >> Sent: Monday, June 04, 2012 10:08 AM
> > >> To: java-user@lucene.apache.org
> > >> Subject: RAMDirectory unexpectedly slows
> > >>
> > >>
> > >> Hi,
> > >>
> > >> My apps need to read from and write to some big indexes frequently.
> So I
> > >> use RAMDirectory instead of FSDirectory, and give JVM about 2GB memory
> > >> size.
> > >>
> > >> I notice that the speed of reading and writing unexpectedly slows as
> the
> > >> size of the indexes increases. Since the usage of RAM is less than
> 20%,
> > I
> > >> think by default the RAMDirectory doesn't take advantage of the
> memory I
> > >> assigned to JVM.
> > >>
> > >> What are the steps to improve the reading and writing speed of
> > >> RAMDirectory?
> > >>
> > >> Thanks!
> > >> Jeff
> > >>
> > >>
> >_
> **_
> **-
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<
> > java-user-unsubscr...@lucene.apache.org>
> > >> For additional commands, e-mail: java-user-help@lucene.apache.**org<
> > java-user-h...@lucene.apache.org>
> > >>
> > >>
> >
> >_
>
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
>


Re: RAMDirectory unexpectedly slows

2012-06-16 Thread Cheng
After a number of test, the performance of MMapDirectory is not even close
to that of RAMDirectory, in terms of speed.

My application w/ the former can only deal with 10 tasks per round while it
could handle over 90 w/ RAMDirectory.

I use the application in Linux.

What can be the reasons?

Thanks.


On Tue, Jun 5, 2012 at 7:53 AM, Uwe Schindler  wrote:

> This is managed by your operating system. In general OS kernels like Linux
> or Windows use all free memory to cache disk accesses.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Monday, June 04, 2012 6:10 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: RAMDirectory unexpectedly slows
> >
> > Can I control the size of ram given to either MMapDirectory or
> > ByteBufferDirectory?
> >
> > On Mon, Jun 4, 2012 at 11:42 PM, Uwe Schindler  wrote:
> >
> > > Hi,
> > >
> > > If you are using MMapDirectory or this ByteBufferDirectory (which is
> > > similar to the first) the used RAM is outside JVM heap, it is in the
> > > FS cache of the OS kernel. Giving too much memory to the JVM penalizes
> > > the OS cache, so give only as much as the App needs. Lucene and the OS
> > > kernel will then utilize the remaining memory for caching.
> > >
> > > Please read docs of MMapDirectory and inform yourself about mmap in
> e.g.
> > > Wikipedia.
> > >
> > > Uwe
> > > --
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, 28213 Bremen
> > > http://www.thetaphi.de
> > >
> > >
> > >
> > > Cheng  schrieb:
> > >
> > > Please shed more insight into the difference between JVM heap size and
> > > the memory size used by Lucene.
> > >
> > > What I am getting at is that no matter however much ram I give my
> > > apps, Lucene can't utilize it. Is that right?
> > >
> > > What about the ByteBufferDirectory? Can this specific directory
> > > utilize the 2GB memory I grant to the app?
> > >
> > > On Mon, Jun 4, 2012 at 10:58 PM, Jason Rutherglen <
> > > jason.rutherg...@gmail.com> wrote:
> > >
> > > > If you want the index to be stored completely in RAM, there is the
> > > > ByteBuffer directory [1]. Though I do not see the point in putting
> > > > an index in RAM, it will be cached in RAM regardless in the OS
> > > > system IO cache.
> > > >
> > > > 1.
> > > >
> > > https://github.com/elasticsearch/elasticsearch/blob/master/src/main/ja
> > > va/org/apache/lucene/store/bytebuffer/ByteBufferDirectory.java
> > > >
> > > > On Mon, Jun 4, 2012 at 10:55 AM, Cheng 
> > wrote:
> > > > > My indexes are 500MB+. So it seems like that RAMDirectory is not
> > > > > good
> > > for
> > > > > that big a size.
> > > > >
> > > > > My challenge, on the other side, is that I need to update the
> > > > > indexes
> > > > very
> > > > > frequently. So, do you think MMapDirectory is the solution?
> > > > >
> > > > > Thanks.
> > > > >
> > > > > On Mon, Jun 4, 2012 at 10:30 PM, Jack Krupansky <
> > > j...@basetechnology.com
> > > > >wrote:
> > > > >
> > > > >> From the javadoc for RAMDirectory:
> > > > >>
> > > > >> "Warning: This class is not intended to work with huge indexes.
> > > > Everything
> > > > >> beyond several hundred megabytes will waste resources (GC
> > > > >> cycles),
> > > > because
> > > > >> it uses an internal buffer size of 1024 bytes, producing millions
> > > > >> of byte[1024] arrays. This class is optimized for small
> > > > >> memory-resident indexes. It also has bad concurrency on
> multithreaded
> > environments.
> > > > >>
> > > > >> It is recommended to materialize large indexes on disk and use
> > > > >> MMapDirectory, which is a high-performance directory
> > > > >> implementation
> > > > working
> > > > >> directly on the file system cache of the operating system, so
> 

Re: RAMDirectory unexpectedly slows

2012-06-18 Thread Cheng
Lucene is used in the following steps:

1) store interested data in Lucene indexes;

2) search key words against the indexes;

3) write new data into indexes and refresh the reader;

4) use the reader to search key words, and the 2-4 steps repeat.

As you see, there are lots of read, update actions. I guess since MMapDir
needs to synchronize to a local drive that causes it to be slower.


The code is attached:

public class YYTLucene {

private static Logger logger = Logger.getLogger(YYTLuceneImpl.class);

private static FSDirectory indexDir;

private static RAMDirectory ramDir;

// private static MMapDirectory ramDir;

private static IndexWriter iw;

private static IndexSearcher is;

private static IndexReader ir;

private static YYTLucene instance;

public static YYTLucene getInstance(String type) {
if (instance == null) {
instance = new YYTLucene(type);
}
return instance;
}

private YYTLucene(String type) {
try {
indexDir = new NIOFSDirectory(new File(ERConstants.indexFolder1
+ "/" + type));

ramDir = new RAMDirectory(indexDir);

// ramDir = new MMapDirectory(new File(ERConstants.indexFolder1 +
// "/"
// + type));

IndexWriterConfig iwConfig = new IndexWriterConfig(
ERConstants.version, new LimitTokenCountAnalyzer(
ERConstants.analyzer, ERConstants.maxTokenNum));

// iwConfig.setMaxBufferedDocs(ERConstants.maxBufferedDocs);
//
// iwConfig.setRAMBufferSizeMB(ERConstants.RAMBufferSizeMB);

iw = new IndexWriter(ramDir, iwConfig);
iw.commit();

ir = IndexReader.open(iw, true);
is = new IndexSearcher(ir);

} catch (IOException e) {
e.printStackTrace();
logger.info("Can't initiate YYTLuceneImpl...");
}
}

public IndexWriter getIndexWriter() {
return iw;
}

public void setIndexWriter(IndexWriter iw) {
YYTLucene.iw = iw;
}

public IndexSearcher getIndexSearcher() {
return is;
}

public void setIndexSearcher(IndexSearcher is) {
YYTLucene.is = is;
}

public IndexReader getIndexReader() {
return ir;
}

public static void setIndexReader(IndexReader ir) {
YYTLucene.ir = ir;
}

}



On Mon, Jun 18, 2012 at 7:32 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> 9 fold improvement using RAMDir over MMapDir is much more than I've
> seen (~30-40% maybe) in the past.
>
> Can you explain how you are using Lucene?
>
> You may also want to try the CachingRAMDirectory patch on
> https://issues.apache.org/jira/browse/LUCENE-4123
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sat, Jun 16, 2012 at 7:18 AM, Cheng  wrote:
> > After a number of test, the performance of MMapDirectory is not even
> close
> > to that of RAMDirectory, in terms of speed.
> >
> > My application w/ the former can only deal with 10 tasks per round while
> it
> > could handle over 90 w/ RAMDirectory.
> >
> > I use the application in Linux.
> >
> > What can be the reasons?
> >
> > Thanks.
> >
> >
> > On Tue, Jun 5, 2012 at 7:53 AM, Uwe Schindler  wrote:
> >
> >> This is managed by your operating system. In general OS kernels like
> Linux
> >> or Windows use all free memory to cache disk accesses.
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >>
> >> > -Original Message-
> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> > Sent: Monday, June 04, 2012 6:10 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Re: RAMDirectory unexpectedly slows
> >> >
> >> > Can I control the size of ram given to either MMapDirectory or
> >> > ByteBufferDirectory?
> >> >
> >> > On Mon, Jun 4, 2012 at 11:42 PM, Uwe Schindler 
> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > If you are using MMapDirectory or this ByteBufferDirectory (which is
> >> > > similar to the first) the used RAM is outside JVM heap, it is in the
> >> > > FS cache of the OS kernel. Giving too much memory to the JVM
> penalizes
> >> > > the OS cache, so give only as much as the App needs. Lucene and the
> OS
> >> > > kernel will then utilize the remaining memory for caching.
> >> > >
> >> > > Please read docs of MMapDirectory and inform yourself about mmap in
> >> e.g.
> >> > > Wikipedia.
> >> > >
> >> > > Uwe
> >> > > --
> >> > > Uwe Schindler
> >> > > H.-H.-Meier-Allee 63, 28213 Bremen
> >> > > http://www.thetaphi.de
> >> > >
> >> > >
> >> > >
> >> > > Cheng 

Re: RAMDirectory unexpectedly slows

2012-06-30 Thread Cheng
Hi,

I can't find the  CachingRAMDirectory  in Lucene 3.6. Is this decaperated?

Thanks

On Mon, Jun 18, 2012 at 7:32 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> 9 fold improvement using RAMDir over MMapDir is much more than I've
> seen (~30-40% maybe) in the past.
>
> Can you explain how you are using Lucene?
>
> You may also want to try the CachingRAMDirectory patch on
> https://issues.apache.org/jira/browse/LUCENE-4123
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Sat, Jun 16, 2012 at 7:18 AM, Cheng  wrote:
> > After a number of test, the performance of MMapDirectory is not even
> close
> > to that of RAMDirectory, in terms of speed.
> >
> > My application w/ the former can only deal with 10 tasks per round while
> it
> > could handle over 90 w/ RAMDirectory.
> >
> > I use the application in Linux.
> >
> > What can be the reasons?
> >
> > Thanks.
> >
> >
> > On Tue, Jun 5, 2012 at 7:53 AM, Uwe Schindler  wrote:
> >
> >> This is managed by your operating system. In general OS kernels like
> Linux
> >> or Windows use all free memory to cache disk accesses.
> >>
> >> -
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: u...@thetaphi.de
> >>
> >>
> >> > -Original Message-
> >> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> >> > Sent: Monday, June 04, 2012 6:10 PM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Re: RAMDirectory unexpectedly slows
> >> >
> >> > Can I control the size of ram given to either MMapDirectory or
> >> > ByteBufferDirectory?
> >> >
> >> > On Mon, Jun 4, 2012 at 11:42 PM, Uwe Schindler 
> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > If you are using MMapDirectory or this ByteBufferDirectory (which is
> >> > > similar to the first) the used RAM is outside JVM heap, it is in the
> >> > > FS cache of the OS kernel. Giving too much memory to the JVM
> penalizes
> >> > > the OS cache, so give only as much as the App needs. Lucene and the
> OS
> >> > > kernel will then utilize the remaining memory for caching.
> >> > >
> >> > > Please read docs of MMapDirectory and inform yourself about mmap in
> >> e.g.
> >> > > Wikipedia.
> >> > >
> >> > > Uwe
> >> > > --
> >> > > Uwe Schindler
> >> > > H.-H.-Meier-Allee 63, 28213 Bremen
> >> > > http://www.thetaphi.de
> >> > >
> >> > >
> >> > >
> >> > > Cheng  schrieb:
> >> > >
> >> > > Please shed more insight into the difference between JVM heap size
> and
> >> > > the memory size used by Lucene.
> >> > >
> >> > > What I am getting at is that no matter however much ram I give my
> >> > > apps, Lucene can't utilize it. Is that right?
> >> > >
> >> > > What about the ByteBufferDirectory? Can this specific directory
> >> > > utilize the 2GB memory I grant to the app?
> >> > >
> >> > > On Mon, Jun 4, 2012 at 10:58 PM, Jason Rutherglen <
> >> > > jason.rutherg...@gmail.com> wrote:
> >> > >
> >> > > > If you want the index to be stored completely in RAM, there is the
> >> > > > ByteBuffer directory [1]. Though I do not see the point in putting
> >> > > > an index in RAM, it will be cached in RAM regardless in the OS
> >> > > > system IO cache.
> >> > > >
> >> > > > 1.
> >> > > >
> >> > >
> https://github.com/elasticsearch/elasticsearch/blob/master/src/main/ja
> >> > > va/org/apache/lucene/store/bytebuffer/ByteBufferDirectory.java
> >> > > >
> >> > > > On Mon, Jun 4, 2012 at 10:55 AM, Cheng 
> >> > wrote:
> >> > > > > My indexes are 500MB+. So it seems like that RAMDirectory is not
> >> > > > > good
> >> > > for
> >> > > > > that big a size.
> >> > > > >
> >> > > > > My challenge, on the other side, is that I need to update the
> >> > > > > indexes
> >> > > > very
> >> > > > > frequently. S

Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
Thanks. I will try that.

Another question. How to use my own dictionary instead of the default one
either in FatJAR or smartcn.jar?

On Thu, Sep 6, 2012 at 10:07 AM, 齐保元  wrote:

>
>
> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>
>
> At 2012-09-06 22:04:58,Cheng  wrote:
> >Hi,
> >
> >The default Lucene core jar contains no the smartcn analyzer. How can I
> >include it into the core jar.
> >
> >Thanks!
>


Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
Also, I checked and couldn't find the smartcn.jar in the originally shipped
Lucene jar. Should I build it myself? and how?
Thanks.

On Thu, Sep 6, 2012 at 10:10 AM, Cheng  wrote:

> Thanks. I will try that.
>
> Another question. How to use my own dictionary instead of the default one
> either in FatJAR or smartcn.jar?
>
>
> On Thu, Sep 6, 2012 at 10:07 AM, 齐保元  wrote:
>
>>
>>
>> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
>>
>>
>> At 2012-09-06 22:04:58,Cheng  wrote:
>> >Hi,
>> >
>> >The default Lucene core jar contains no the smartcn analyzer. How can I
>> >include it into the core jar.
>> >
>> >Thanks!
>>
>
>


Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
IKAnalyzer is not supported in Lucene, right?

On Thu, Sep 6, 2012 at 10:14 AM, 齐保元  wrote:

>
> 1.fatjar is a tool for archiving jars/classes together NOTan analyzer.
> 2.smartcn seems not able to import your own dictionay,it can only import
> stop word dict;You can try IKAnalyzer instead.
>
>
> At 2012-09-06 22:10:15,Cheng  wrote:
> >Thanks. I will try that.
> >
> >Another question. How to use my own dictionary instead of the default one
> >either in FatJAR or smartcn.jar?
> >
> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
> >
> >>
> >>
> >> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
> >>
> >>
> >> At 2012-09-06 22:04:58,Cheng  wrote:
> >> >Hi,
> >> >
> >> >The default Lucene core jar contains no the smartcn analyzer. How can I
> >> >include it into the core jar.
> >> >
> >> >Thanks!
> >>
>


Re: Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
I use 3.5 now, and plan to try 3.6. How can I use IKAnalyzer and make the
analyzer to use my own dictionary and work together with Lucene?

Thanks so much for help.

On Thu, Sep 6, 2012 at 10:19 AM, 齐保元  wrote:

>
>
> you'd better tell me the version of lucene.the latest version
> ikanlyzer2012 support lucene3.6
>
>
>
>
> >IKAnalyzer is not supported in Lucene, right?
> >
> >On Thu, Sep 6, 2012 at 10:14 AM,   wrote:
> >
> >>
> >> 1.fatjar is a tool for archiving jars/classes together NOTan analyzer.
> >> 2.smartcn seems not able to import your own dictionay,it can only import
> >> stop word dict;You can try IKAnalyzer instead.
> >>
> >>
> >> At 2012-09-06 22:10:15,Cheng  wrote:
> >> >Thanks. I will try that.
> >> >
> >> >Another question. How to use my own dictionary instead of the default
> one
> >> >either in FatJAR or smartcn.jar?
> >> >
> >> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
> >> >
> >> >>
> >> >>
> >> >> import contrib/smartcn.jar is not complicated.or you can try FatJAR.
> >> >>
> >> >>
> >> >> At 2012-09-06 22:04:58,Cheng  wrote:
> >> >> >Hi,
> >> >> >
> >> >> >The default Lucene core jar contains no the smartcn analyzer. How
> can I
> >> >> >include it into the core jar.
> >> >> >
> >> >> >Thanks!
> >> >>
> >>
>


Re: Re: Re: Re: How to incorporate the SmartCnAnalyzer in the core lucene jar

2012-09-06 Thread Cheng
Thanks.

The instruction says that user can use IKAnalyzercfg.xml to configure the
extension dictionary and stopword dictionary. It also mentions that the xml
file should be put to the class root.

In an eclipse java project, where is the class root?

Thanks





On Thu, Sep 6, 2012 at 10:27 AM, qibaoyuan  wrote:

> check out http://code.google.com/p/ik-analyzer/  it's quite
> straightforward.
>
>
>
> At 2012-09-06 22:22:45,Cheng  wrote:
> >I use 3.5 now, and plan to try 3.6. How can I use IKAnalyzer and make the
> >analyzer to use my own dictionary and work together with Lucene?
> >
> >Thanks so much for help.
> >
> >On Thu, Sep 6, 2012 at 10:19 AM, 齐保元  wrote:
> >
> >>
> >>
> >> you'd better tell me the version of lucene.the latest version
> >> ikanlyzer2012 support lucene3.6
> >>
> >>
> >>
> >>
> >> >IKAnalyzer is not supported in Lucene, right?
> >> >
> >> >On Thu, Sep 6, 2012 at 10:14 AM,   wrote:
> >> >
> >> >>
> >> >> 1.fatjar is a tool for archiving jars/classes together NOTan
> analyzer.
> >> >> 2.smartcn seems not able to import your own dictionay,it can only
> import
> >> >> stop word dict;You can try IKAnalyzer instead.
> >> >>
> >> >>
> >> >> At 2012-09-06 22:10:15,Cheng  wrote:
> >> >> >Thanks. I will try that.
> >> >> >
> >> >> >Another question. How to use my own dictionary instead of the
> default
> >> one
> >> >> >either in FatJAR or smartcn.jar?
> >> >> >
> >> >> >On Thu, Sep 6, 2012 at 10:07 AM  wrote:
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> import contrib/smartcn.jar is not complicated.or you can try
> FatJAR.
> >> >> >>
> >> >> >>
> >> >> >> At 2012-09-06 22:04:58,Cheng  wrote:
> >> >> >> >Hi,
> >> >> >> >
> >> >> >> >The default Lucene core jar contains no the smartcn analyzer. How
> >> can I
> >> >> >> >include it into the core jar.
> >> >> >> >
> >> >> >> >Thanks!
> >> >> >>
> >> >>
> >>
>


Re: Index size doubles every time when I synchronize the RAM-based index with the FD-based index

2012-09-30 Thread Cheng
Yes. I build ram indexes from disk and update the ram indexes when new docs
come in (Step 1). When the number of new docs gets to 10,000, I will
persistent the ram indexes to disk (Step 2).

The bigger concern is however the update. I don't know how much ram is
eaten up, but I suppose whenever do the Step 1.

I have about 10 million docs. I use Lucene 3.5.

Thanks.


On Mon, Oct 1, 2012 at 4:02 AM, Ian Lea  wrote:

> Are you loading it from disk, adding loads of docs then writing it
> back to disk?  That would do it.
>
> How many docs in the memory index?  How many on disk?  What version of
> lucene?
>
> --
> Ian.
>
>
> On Fri, Sep 28, 2012 at 1:56 AM, Cheng  wrote:
> > Hi,
> >
> > I have a ram based index which occasionally needs to be persistent with a
> > disk based index. Every time the size doubles which eats up my disk space
> > quickly. Below is the code. Could someone help me?
> >
> > Thanks,
> > Cheng
> >
> > try {
> > IndexWriterConfig iwc = new IndexWriterConfig(ERConstants.version,
> > ERConstants.analyzer);
> >
> > iwc.setOpenMode(OpenMode.CREATE);
> >
> > IndexWriter writer = new IndexWriter(new NIOFSDirectory(new File(
> > indexpath)), iwc);
> > writer.commit();
> >
> > writer.addIndexes(source);
> > writer.close();
> >
> > } catch (IOException e) {
> > e.printStackTrace();
> > }
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: IndexReader.open and CorruptIndexException

2013-01-24 Thread Cheng
Here is the log:

Jan 24, 2013 4:10:33 AM org.apache.tomcat.util.net.AprEndpoint$Acceptor run
SEVERE: Socket accept failed
org.apache.tomcat.jni.Error: 24: Too many open files
at org.apache.tomcat.jni.Socket.accept(Native Method)
at org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:990)
at java.lang.Thread.run(Thread.java:722)



Too many open files... How to solve it?


On Tue, Jan 22, 2013 at 10:52 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Can you post the full stack trace of the CorruptIndexException?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jan 22, 2013 at 8:20 AM, Cheng  wrote:
> > Hi,
> >
> > I run a Lucene application on Tomcat. The app will try to open a Linux
> > directory, and sometime returns CorruptIndexException error.
> >
> > Shortly after I restart Tomcat (nothing else changes), the app can be run
> > on the fly. I am using the following statements to open a directory:
> >
> >
> > try {
> >   searcher = new IndexSearcher(IndexReader.open(new
> NIOFSDirectory(new
> > File("/home/user/"+ type;
> > } catch (IOException e) {
> >   throw new Exception("[" + type + "] Cannot open index folder...");
> > }
> >
> > I would like to know how to tackle this problem.
> >
> > Many thanks!
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: 回复: IndexReader.open and CorruptIndexException

2013-01-25 Thread Cheng
Any example code for this SearcherManager?

On Fri, Jan 25, 2013 at 3:59 AM, Ian Lea  wrote:

> There will be one file handle for every currently open file.
>
> Use SearcherManager and this problem should go away.
>
>
> --
> Ian.
>
>
> On Thu, Jan 24, 2013 at 6:40 PM, zhoucheng2008 
> wrote:
> > What file handlers did you guy refer to?
> >
> >
> > I opened the index directory only. Is this the file handler? Also, how
> to safely and effectively close the index directory?
> >
> >
> > I found the link's explanation somewhat self-contradictory. After I read
> it, I am confused if I should close the file handlers in the finally block
> or not. I am using Java.
> >
> >
> >
> >
> > -- 原始邮件 --
> > 发件人: "Ian Lea";
> > 发送时间: 2013年1月24日(星期四) 下午5:46
> > 收件人: "java-user";
> >
> > 主题: Re: IndexReader.open and CorruptIndexException
> >
> >
> >
> > Well, raising the limits is one option but there may be better ones.
> >
> > There's an FAQ entry on this:
> >
> http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_an_IOException_that_says_.22Too_many_open_files.22.3F
> >
> > Take a look at org.apache.lucene.search.SearcherManager "Utility class
> > to safely share IndexSearcher instances across multiple threads".
> >
> >
> > --
> > Ian.
> >
> >
> > On Thu, Jan 24, 2013 at 9:18 AM, Rafał Kuć  wrote:
> >> Hello!
> >>
> >> You need to allow the user that is running Lucene to open more files.
> >> There are plenty of tutorials available on the web. Modify your
> >> /etc/security/limits.conf and if for example your user is lucene add
> >> the following (or modify if those already exists):
> >>
> >> lucene soft nofile 64000
> >> lucene hard nofile 64000
> >>
> >> Relog and run:
> >>
> >> sudo -u lucene -s "ulimit -Sn"
> >>
> >> To see if the limits are the ones you set. If they are not, check if
> >> you don't have pam_limits.so commented in the /etc/pam.d/
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>
> >>> Here is the log:
> >>
> >>> Jan 24, 2013 4:10:33 AM
> >>> org.apache.tomcat.util.net.AprEndpoint$Acceptor run
> >>> SEVERE: Socket accept failed
> >>> org.apache.tomcat.jni.Error: 24: Too many open files
> >>> at org.apache.tomcat.jni.Socket.accept(Native Method)
> >>> at
> >>>
> org.apache.tomcat.util.net.AprEndpoint$Acceptor.run(AprEndpoint.java:990)
> >>> at java.lang.Thread.run(Thread.java:722)
> >>
> >>
> >>
> >>> Too many open files... How to solve it?
> >>
> >>
> >>> On Tue, Jan 22, 2013 at 10:52 PM, Michael McCandless <
> >>> luc...@mikemccandless.com> wrote:
> >>
> >>>> Can you post the full stack trace of the CorruptIndexException?
> >>>>
> >>>> Mike McCandless
> >>>>
> >>>> http://blog.mikemccandless.com
> >>>>
> >>>> On Tue, Jan 22, 2013 at 8:20 AM, Cheng 
> wrote:
> >>>> > Hi,
> >>>> >
> >>>> > I run a Lucene application on Tomcat. The app will try to open a
> Linux
> >>>> > directory, and sometime returns CorruptIndexException error.
> >>>> >
> >>>> > Shortly after I restart Tomcat (nothing else changes), the app can
> be run
> >>>> > on the fly. I am using the following statements to open a directory:
> >>>> >
> >>>> >
> >>>> > try {
> >>>> >   searcher = new IndexSearcher(IndexReader.open(new
> >>>> NIOFSDirectory(new
> >>>> > File("/home/user/"+ type;
> >>>> > } catch (IOException e) {
> >>>> >   throw new Exception("[" + type + "] Cannot open index
> folder...");
> >>>> > }
> >>>> >
> >>>> > I would like to know how to tackle this problem.
> >>>> >
> >>>> > Many thanks!
> >>>>
> >>>> -
> >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>>
> >>>>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: How to add a field to hold a Java map object?

2013-02-13 Thread Cheng
http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/document/StringField.html

I found StringField API here, however, it seems that StringField can't be
found and thus not compiled.

My lucene is 3.5

On Wed, Feb 13, 2013 at 4:54 AM, Ian Lea  wrote:

> Assuming you mean the String representation of a Map, the same way you
> do any other String: use StringField or an analyzer that keeps the
> characters you want it to.  Maybe WhitespaceAnalyzer.
>
>
> --
> Ian.
>
>
> On Wed, Feb 13, 2013 at 1:34 AM, Cheng  wrote:
> > Hi,
> >
> > How can I add field to hold a Java map object in such way that the "[",
> > "]", "," are preserved?
> >
> > Thanks!
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: How to add a field to hold a Java map object?

2013-02-13 Thread Cheng
Here is my code to add Java map object into Lucene:

Map map = new HashMap<>();
map.put("栋", "6");
map.put("号", "202");

Fieldable fd = new Field("testMap", map.toString(), Store.YES, Index.NO);

Document d = new Document();

d.add(fd);

try {
writer.addDocument(d);
writer.commit();
} catch (Exception e) {

}


Unfortunately, when I search the index, all what I get is:

{号=202, 栋=6}, which doesn't contain double quotes. Therefore I can't
rebuild the map object with the return value.

Please help.


On Wed, Feb 13, 2013 at 10:46 PM, Cheng  wrote:

>
> http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/document/StringField.html
>
> I found StringField API here, however, it seems that StringField can't be
> found and thus not compiled.
>
> My lucene is 3.5
>
>
> On Wed, Feb 13, 2013 at 4:54 AM, Ian Lea  wrote:
>
>> Assuming you mean the String representation of a Map, the same way you
>> do any other String: use StringField or an analyzer that keeps the
>> characters you want it to.  Maybe WhitespaceAnalyzer.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Feb 13, 2013 at 1:34 AM, Cheng  wrote:
>> > Hi,
>> >
>> > How can I add field to hold a Java map object in such way that the "[",
>> > "]", "," are preserved?
>> >
>> > Thanks!
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>


Lucene 3.3 in Eclipse

2011-05-15 Thread cheng
Hi, I created a java project for Lucene 3.3 in Eclipse, and found that in
the DbHandleExtractor.java file, the package of com.sleepycat.db.internal.Db
is not resolved. How can I overcome this?

 

I have tried to download .jar for this, but don't know which and where to
download.

 

Thanks



RE: Lucene 3.3 in Eclipse

2011-05-15 Thread cheng
Steve, thanks for correction. You are right. The version is 3.0.3 released last 
Oct.

I did place an ant jar in Eclipse, and it does the job to remove some compiling 
errors. However, it seems that I do need some jar file to handle the 
DbHandleExtractor.java and the org.apache.lucene.store.db package, which are 
under contrib/db/bdb/src/java folder.

Do you know when I can find the proper jar file?

Cheng

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: Sunday, May 15, 2011 10:08 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene 3.3 in Eclipse

Hi Cheng,

Lucene 3.3 does not exist - do you mean branches/branch_3x ?

FYI, as of Lucene 3.1, there is an Ant target you can use to setup an Eclipse 
project for  Lucene/Solr - run this from the top level directory of a full 
source tree (including dev-tools/ directory) checked out from Subversion: 

   ant eclipse

More info here:

   <http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips>

Steve

> -Original Message-
> From: cheng [mailto:zhoucheng2...@gmail.com]
> Sent: Sunday, May 15, 2011 4:29 AM
> To: java-user@lucene.apache.org
> Subject: Lucene 3.3 in Eclipse
> 
> Hi, I created a java project for Lucene 3.3 in Eclipse, and found that in
> the DbHandleExtractor.java file, the package of
> com.sleepycat.db.internal.Db
> is not resolved. How can I overcome this?
> 
> 
> 
> I have tried to download .jar for this, but don't know which and where to
> download.
> 
> 
> 
> Thanks



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Lucene 3.3 in Eclipse

2011-05-16 Thread cheng
Steve, the two links are really helpful. I still have a few questions:

1) The import statement, import 
org.apache.lucene.queryParser.standard.parser.StandardSyntaxParser, doesn't 
work because the StandardSyntaxParser.java is not available.

Do you know where to download it?

2) The CharStream class is not available. Please see this, "public final class 
FastCharStream implements CharStream"

What is it? Do you know where to download it?

3) The QueryParser class can't be resolve. Please see this, SrndQuery lq = 
QueryParser.parse(queryText);

Thanks,
Cheng




-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: Sunday, May 15, 2011 11:15 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene 3.3 in Eclipse

(Resending to the list - didn't notice that my reply went to Cheng directly)

There is an Ant target "get-db-jar" that can do the downloading for you - you 
can see the URL it uses here:

<http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/contrib/db/bdb/build.xml?view=markup#l49>

There is another Ant target "get-je-jar" that does the same thing for the 
contrib/db/bdb-je/ module:

<http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/contrib/db/bdb-je/build.xml?view=markup#l49>

Steve

> -Original Message-
> From: cheng [mailto:zhoucheng2...@gmail.com]
> Sent: Sunday, May 15, 2011 10:48 AM
> To: java-user@lucene.apache.org
> Cc: Steven A Rowe
> Subject: RE: Lucene 3.3 in Eclipse
> 
> Steve, thanks for correction. You are right. The version is 3.0.3
> released last Oct.
> 
> I did place an ant jar in Eclipse, and it does the job to remove some
> compiling errors. However, it seems that I do need some jar file to
> handle the DbHandleExtractor.java and the org.apache.lucene.store.db
> package, which are under contrib/db/bdb/src/java folder.
> 
> Do you know when I can find the proper jar file?
> 
> Cheng
> 
> -Original Message-
> From: Steven A Rowe [mailto:sar...@syr.edu]
> Sent: Sunday, May 15, 2011 10:08 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene 3.3 in Eclipse
> 
> Hi Cheng,
> 
> Lucene 3.3 does not exist - do you mean branches/branch_3x ?
> 
> FYI, as of Lucene 3.1, there is an Ant target you can use to setup an
> Eclipse project for  Lucene/Solr - run this from the top level directory
> of a full source tree (including dev-tools/ directory) checked out from
> Subversion:
> 
>ant eclipse
> 
> More info here:
> 
> 
> <http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips
> >
> 
> Steve
> 
> > -Original Message-
> > From: cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Sunday, May 15, 2011 4:29 AM
> > To: java-user@lucene.apache.org
> > Subject: Lucene 3.3 in Eclipse
> >
> > Hi, I created a java project for Lucene 3.3 in Eclipse, and found that
> in
> > the DbHandleExtractor.java file, the package of
> > com.sleepycat.db.internal.Db
> > is not resolved. How can I overcome this?
> >
> >
> >
> > I have tried to download .jar for this, but don't know which and where
> to
> > download.
> >
> >
> >
> > Thanks
> 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Lucene 3.3 in Eclipse

2011-05-16 Thread cheng
Just curious. How would this version be published if there are missing jar and 
there are compiling errors?

-Original Message-
From: Steven A Rowe [mailto:sar...@syr.edu] 
Sent: Sunday, May 15, 2011 11:15 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene 3.3 in Eclipse

(Resending to the list - didn't notice that my reply went to Cheng directly)

There is an Ant target "get-db-jar" that can do the downloading for you - you 
can see the URL it uses here:

<http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/contrib/db/bdb/build.xml?view=markup#l49>

There is another Ant target "get-je-jar" that does the same thing for the 
contrib/db/bdb-je/ module:

<http://svn.apache.org/viewvc/lucene/java/tags/lucene_3_0_3/contrib/db/bdb-je/build.xml?view=markup#l49>

Steve

> -Original Message-
> From: cheng [mailto:zhoucheng2...@gmail.com]
> Sent: Sunday, May 15, 2011 10:48 AM
> To: java-user@lucene.apache.org
> Cc: Steven A Rowe
> Subject: RE: Lucene 3.3 in Eclipse
> 
> Steve, thanks for correction. You are right. The version is 3.0.3
> released last Oct.
> 
> I did place an ant jar in Eclipse, and it does the job to remove some
> compiling errors. However, it seems that I do need some jar file to
> handle the DbHandleExtractor.java and the org.apache.lucene.store.db
> package, which are under contrib/db/bdb/src/java folder.
> 
> Do you know when I can find the proper jar file?
> 
> Cheng
> 
> -Original Message-
> From: Steven A Rowe [mailto:sar...@syr.edu]
> Sent: Sunday, May 15, 2011 10:08 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene 3.3 in Eclipse
> 
> Hi Cheng,
> 
> Lucene 3.3 does not exist - do you mean branches/branch_3x ?
> 
> FYI, as of Lucene 3.1, there is an Ant target you can use to setup an
> Eclipse project for  Lucene/Solr - run this from the top level directory
> of a full source tree (including dev-tools/ directory) checked out from
> Subversion:
> 
>ant eclipse
> 
> More info here:
> 
> 
> <http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips
> >
> 
> Steve
> 
> > -Original Message-
> > From: cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Sunday, May 15, 2011 4:29 AM
> > To: java-user@lucene.apache.org
> > Subject: Lucene 3.3 in Eclipse
> >
> > Hi, I created a java project for Lucene 3.3 in Eclipse, and found that
> in
> > the DbHandleExtractor.java file, the package of
> > com.sleepycat.db.internal.Db
> > is not resolved. How can I overcome this?
> >
> >
> >
> > I have tried to download .jar for this, but don't know which and where
> to
> > download.
> >
> >
> >
> > Thanks
> 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



JobClient.runJob(job) in Fetcher.java

2011-05-25 Thread Cheng
Hi, I notice that there are a few run() methods in Fetcher.java and that the
following statement in Crawler.java calls the JobClient.runJob(job) in
Fetcher.java.

fetcher.fetch(segs[0], threads,
org.apache.nutch.fetcher.Fetcher.isParsing(conf));

I would like to know which run() in Fetcher.java has been called by the
above statetment.

Thanks.


Search multiple directories simultaneously

2011-06-23 Thread Cheng
Hi,

I have multiple indexed folders (or directories), each holding indexing
files for specific purposes. I want to do a search over these folders (or
directories) in a same query.

Is it possible?

Thanks


Re: Search multiple directories simultaneously

2011-06-23 Thread Cheng
thanks man. very condense and easy to follow.

can i ask how the multiple search will impact the performance? i have
probably 50GB data in each of the 10-20 folders.

On Fri, Jun 24, 2011 at 1:04 AM, Uwe Schindler  wrote:

> IndexReader index1 = IndexReader.open(dir1);
> IndexReader index2 = IndexReader.open(dir2);
> IndexReader index3 = IndexReader.open(dir3);
> ...
> IndexReader all = new MultiReader(index1, index2, index3,...);
> IndexSearcher searcher = new IndexSearcher(all);
>
> ...search your indexes...
>
> all.close();
> index1.close();
> index2.close();
> index3.close();
> ...
>
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Cheng [mailto:zhoucheng2...@gmail.com]
> > Sent: Thursday, June 23, 2011 4:19 PM
> > To: java-user@lucene.apache.org
> > Subject: Search multiple directories simultaneously
> >
> > Hi,
> >
> > I have multiple indexed folders (or directories), each holding indexing
> files
> > for specific purposes. I want to do a search over these folders (or
> > directories) in a same query.
> >
> > Is it possible?
> >
> > Thanks
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Why does QueryBuilder.createBooleanQuery create something different from input?

2014-05-12 Thread Cheng
Hi,

I build a query using
QueryBuilder.createBooleanQuery("title","【微信活动】6500盒“健康瘦身减肥”梅免费送").

When I check the query, the toString() of this query looks like:

Query: title:而 title:不用 title:下载 title:2. title:目前 title:来说 title:已经
title:完美越狱 title:的人 title:没有 title:任何 title:必要 title:再用 title:红 title:雪
title:3. title:有人 title:问 title:红 title:雪 title:和 title:黑雨 title:比 title:到底
title:哪个 title:好 title:我 title:觉得 title:各有所长 title:各有 title:互补 title:至少
title:红 title:雪 title:可以 title:当 title:一个 title:开机 title:logo title:替换
title:工具 title:4.iphone title:2g title:和 title:3g title:可以 title:通过 title:红
title:雪 title:0.9.3

This is totally different from the input "【微信活动】6500盒“健康瘦身减肥”梅免费送".

Can someone tell me why?

Thanks


Can RAMDirectory work for gigabyte data which needs refreshing of the index all the time?

2014-05-14 Thread Cheng
Hi,

I have an index of multiple gigabytes which serves 5-10 threads and needs
refreshing very often. I wonder if RAMDirectory is the good candidate for
this purpose. If not, what kind of directory is better?

Thanks,
Cheng


Lucene suggester can't suggest similar phrase

2014-09-29 Thread Cheng
Hi,

I am using Lucene 4.10 suggester which I thought can return similar phrase.
But it turned out the different way.

My code is as follow:

public static void main(String[] args) throws IOException {

String path = "c:/data/suggest/dic.txt";

Dictionary dic;

dic = new FileDictionary(new FileInputStream(path));

InputIterator it = dic.getEntryIterator();

Analyzer analyzer = ERAnalyzer.getInstance().getAnalyzer();

FuzzySuggester suggester = new FuzzySuggester(analyzer);

suggester.build(it);

CharSequence cs = "雅诗兰黛";

List results = suggester.lookup(cs, false, 1);

System.out.println(results.get(0).key);

}

The dictionary contains only one line:

雅诗兰黛 50

When cs is exactly "雅诗兰黛", I get the result. But when cs is "雅思兰黛", which
is only one word different from the target, I get nothing back.

I tried FuzzySuggester as well as AnalyzingSuggester. The result is the
same.

Did I miss something here?

Thanks!


NoClassDefFoundError for EarlyTerminatingSortingCollector

2014-10-02 Thread Cheng
Hi all,

I am using the following simple code, which led to NoClassDefFoundError for
EarlyTerminatingSortingCollector. Any one can help?

Thanks.

RAMDirectory index_dir = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer();
AnalyzingInfixSuggester suggester = new
AnalyzingInfixSuggester(Version.LATEST,
index_dir, analyzer);


How to create document objects in our case

2011-05-20 Thread Cheng Zhou
Hi,

I have a large number of XML files to be indexed by Lucene. All the files
share similar structure as below:


   
   
   
   ..


Things to be noted are:

The root element of Group has 30 or so attributes, and it usually has over
2000 Subgroup elements, which in turn also have more than 20 attributes.

I want to create one Document object which holds the contents of the Group
element, and one Document object which holds all the Subgroup elements.

Here are my challenges however:

1. How many fields are advised for a Document to be indexed by Lucene? Will
over 30 fields (for the Group element) be too many?

2. How to create a Document object and fields for holding all the Subgroup
elements? Is this a good way to think of?

3. How can I link the Document object of the Group element to the Document
object of all the Subgroup elements?

Please note that I intend to use such two Document objects to achieve the
group while I don't know whether it is a good solution or not. I am open to
using more than two Documents to do the job, but I don't know how to connect
all the objects in Lucene.

Many thanks!


Re: how to search multiple fields

2011-05-25 Thread Cheng Zhou
Hi lan, thanks. Still two questions.

In the first link you presented, there is one comment that "Note that terms
which occur in short fields have a higher effect on the result ranking."

What does "short fields" mean? What are the differences between the impact
of the short fields and that of the field boost?

Cheng
On Wed, May 25, 2011 at 6:20 PM, Ian Lea  wrote:

> > Quite a few Lucene examples on lines shows how to insert multiple fields
> > into a Document and how to query the indexed file with certain fields and
> > queried text. I would like to know:
> >
> > 1.   How to do a cross-field search?
>
>
> http://wiki.apache.org/lucene-java/LuceneFAQ#How_can_I_search_over_multiple_fields.3F
>
> > 2.   How to specify some key fields as well as some less important
> > fields?
>
> Boosting.  See
> http://wiki.apache.org/lucene-java/LuceneFAQ#What_is_the_difference_between_field_.28or_document.29_boosting_and_query_boosting.3F
>
> > 3.   How many fields would cause performance issue?
>
> Impossible to answer since there are too many variables but in general
> the fewer fields used in a search the faster it will be.  There are
> many other factors, some of which are likely to outweigh this.  See
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed.
>
>
> --
> Ian.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Is there a limit on the size of the text for a single field?

2011-05-25 Thread Cheng Zhou
Hi, I wonder if I can associate a text string of over 5MB with a single
field.

Thanks.


Re: Is there a limit on the size of the text for a single field?

2011-05-25 Thread Cheng Zhou
thanks lan.

On Wed, May 25, 2011 at 11:44 PM, Ian Lea  wrote:

> Sure.  See the javadocs for IndexWriter.setMaxFieldLength or
> LimitTokenCountAnalyzer if you are using 3.1.0.
>
>
> --
> Ian.
>
>
> On Wed, May 25, 2011 at 4:24 PM, Cheng Zhou 
> wrote:
> > Hi, I wonder if I can associate a text string of over 5MB with a single
> > field.
> >
> > Thanks.
> >
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


RE: Memory eaten up by String, Term and TermInfo?

2008-09-14 Thread Peter Cheng
I'll try later and report back ASAP. You know, it takes days to cause OOM.
Thank you all!

Gong

> -Original Message-
> From: Michael McCandless [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, September 14, 2008 10:28 PM
> To: java-user@lucene.apache.org
> Subject: Re: Memory eaten up by String, Term and TermInfo?
> 
> 
> Small correction: it was checked in this morning (at least, on the  
> East Coast of the US).
> 
> So you need to either build your own JAR using Lucene's trunk, or,  
> wait for tonite's build to run and then download the build artifacts  
> from here:
> 
>  http://hudson.zones.apache.org/hudson/job/Lucene-trunk
> 
> If possible, please report back if this fixed your OutOfMemoryError.
> 
> 2.4 will include this fix.
> 
> Mike
> 
> Chris Lu wrote:
> 
> > Can you try to update to the latest Lucene svn version, like  
> > yesterday?
> > LUCENE-1383 was checked in yesterday. This patch is 
> addressing a leak
> > problem particular to J2EE applications.
> >
> > -- 
> > Chris Lu
> > -
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> > 
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database
> _Search_in_3_minutes
> > DBSight customer, a shopping comparison site, (anonymous per  
> > request) got
> > 2.6 Million Euro funding!
> >
> > On Sun, Sep 14, 2008 at 6:58 AM, Peter Cheng  
> > <[EMAIL PROTECTED]>wrote:
> >
> >> Hi the community,
> >>
> >> In a Tomcat application (a keyword-based search engine), I use  
> >> Lucene 2.3.2
> >> to index 60 million documents, but each document is small-sized.  
> >> The total
> >> index size is about 60GB.
> >> After a successful running for a week, Tomcat was down due to  
> >> OutOfMemory.
> >> Then I restarted Tomcat, and after three days, I used jmap 
> and jhat  
> >> to find
> >> out what had eaten up the memory. I found millions of 
> instances of  
> >> String,
> >> Term, and TermInfo. Why?
> >>
> >> In my application, I use a single IndexSearcher object, which is  
> >> shared by
> >> all the requests. It is opened initially, and will never be closed.
> >>
> >> What could have eaten up the memory? What is referring to 
> millions of
> >> instances of Term and TermInfo?
> >>
> >> I can provide any snippets of codes if necessary.
> >> Thank you so much!
> >>
> >> Gong Cheng
> >>
> >>
> >> 
> -
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Memory eaten up by String, Term and TermInfo?

2008-10-05 Thread Peter Cheng
I have confirmed that the OutOfMemoryError is not Lucene's problem. It's
just because JVM failed to perform GC when necessary, and I don't know why.
To fix this, I started another thread to call GC every six hours, and
problems got solved.

Thank you all.

Gong

> -Original Message-
> From: Michael McCandless [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, September 14, 2008 10:28 PM
> To: java-user@lucene.apache.org
> Subject: Re: Memory eaten up by String, Term and TermInfo?
> 
> 
> Small correction: it was checked in this morning (at least, on the  
> East Coast of the US).
> 
> So you need to either build your own JAR using Lucene's trunk, or,  
> wait for tonite's build to run and then download the build artifacts  
> from here:
> 
>  http://hudson.zones.apache.org/hudson/job/Lucene-trunk
> 
> If possible, please report back if this fixed your OutOfMemoryError.
> 
> 2.4 will include this fix.
> 
> Mike
> 
> Chris Lu wrote:
> 
> > Can you try to update to the latest Lucene svn version, like  
> > yesterday?
> > LUCENE-1383 was checked in yesterday. This patch is 
> addressing a leak
> > problem particular to J2EE applications.
> >
> > -- 
> > Chris Lu
> > -
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> > 
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database
> _Search_in_3_minutes
> > DBSight customer, a shopping comparison site, (anonymous per  
> > request) got
> > 2.6 Million Euro funding!
> >
> > On Sun, Sep 14, 2008 at 6:58 AM, Peter Cheng  
> > <[EMAIL PROTECTED]>wrote:
> >
> >> Hi the community,
> >>
> >> In a Tomcat application (a keyword-based search engine), I use  
> >> Lucene 2.3.2
> >> to index 60 million documents, but each document is small-sized.  
> >> The total
> >> index size is about 60GB.
> >> After a successful running for a week, Tomcat was down due to  
> >> OutOfMemory.
> >> Then I restarted Tomcat, and after three days, I used jmap 
> and jhat  
> >> to find
> >> out what had eaten up the memory. I found millions of 
> instances of  
> >> String,
> >> Term, and TermInfo. Why?
> >>
> >> In my application, I use a single IndexSearcher object, which is  
> >> shared by
> >> all the requests. It is opened initially, and will never be closed.
> >>
> >> What could have eaten up the memory? What is referring to 
> millions of
> >> instances of Term and TermInfo?
> >>
> >> I can provide any snippets of codes if necessary.
> >> Thank you so much!
> >>
> >> Gong Cheng
> >>
> >>
> >> 
> -
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Memory eaten up by String, Term and TermInfo?

2008-10-06 Thread Peter Cheng
Oh yes, I forget to mention that MaxPermSize is very useful, and may be
another key that solves my problem. I haven't tried UseConcMarkSweepGC and
the other two parameters, and I will try them instead of my own GC thread to
see whether the problem can also be solved.

Thanks Brian!

Regards,
Gong

> -Original Message-
> From: Beard, Brian [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 06, 2008 8:48 PM
> To: java-user@lucene.apache.org
> Subject: RE: Memory eaten up by String, Term and TermInfo?
> 
> I played around with GC quite a bit in our app and found the following
> java settings to help a lot (Used with jboss, but should be 
> good for any
> jvm).
> 
> set JAVA_OPTS=%JAVA_OPTS% -XX:MaxPermSize=512M -XX:+UseConcMarkSweepGC
> -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
> 
> While these settings did help when the GC starts to get taxed for
> keeping throughput better, the biggest thing was getting the heap size
> big enough. If the heap's big enough, then everything seems 
> to work well
> regardless of the GC algorithm selected.
> 
> If the heap size and algorithm are correct, then you shouldn't have to
> force GC. Usually the forced GC calls will trigger the total GC which
> has a long delay and slows down responsiveness to the app.
> 
> I found jstat fairly helpful in monitoring all of this. You can see if
> the following article helps at all.
> 
> http://java.sun.com/javase/technologies/hotspot/gc/index.jsp
> 
> -Original Message-
> From: Peter Cheng [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, October 05, 2008 7:55 AM
> To: java-user@lucene.apache.org
> Subject: RE: Memory eaten up by String, Term and TermInfo?
> 
> I have confirmed that the OutOfMemoryError is not Lucene's 
> problem. It's
> just because JVM failed to perform GC when necessary, and I don't know
> why.
> To fix this, I started another thread to call GC every six hours, and
> problems got solved.
> 
> Thank you all.
> 
> Gong
> 
> > -Original Message-
> > From: Michael McCandless [mailto:[EMAIL PROTECTED] 
> > Sent: Sunday, September 14, 2008 10:28 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Memory eaten up by String, Term and TermInfo?
> > 
> > 
> > Small correction: it was checked in this morning (at least, on the  
> > East Coast of the US).
> > 
> > So you need to either build your own JAR using Lucene's trunk, or,  
> > wait for tonite's build to run and then download the build 
> artifacts  
> > from here:
> > 
> >  http://hudson.zones.apache.org/hudson/job/Lucene-trunk
> > 
> > If possible, please report back if this fixed your OutOfMemoryError.
> > 
> > 2.4 will include this fix.
> > 
> > Mike
> > 
> > Chris Lu wrote:
> > 
> > > Can you try to update to the latest Lucene svn version, like  
> > > yesterday?
> > > LUCENE-1383 was checked in yesterday. This patch is 
> > addressing a leak
> > > problem particular to J2EE applications.
> > >
> > > -- 
> > > Chris Lu
> > > -
> > > Instant Scalable Full-Text Search On Any Database/Application
> > > site: http://www.dbsight.net
> > > demo: http://search.dbsight.com
> > > Lucene Database Search in 3 minutes:
> > > 
> > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database
> > _Search_in_3_minutes
> > > DBSight customer, a shopping comparison site, (anonymous per  
> > > request) got
> > > 2.6 Million Euro funding!
> > >
> > > On Sun, Sep 14, 2008 at 6:58 AM, Peter Cheng  
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > >> Hi the community,
> > >>
> > >> In a Tomcat application (a keyword-based search engine), I use  
> > >> Lucene 2.3.2
> > >> to index 60 million documents, but each document is 
> small-sized.  
> > >> The total
> > >> index size is about 60GB.
> > >> After a successful running for a week, Tomcat was down due to  
> > >> OutOfMemory.
> > >> Then I restarted Tomcat, and after three days, I used jmap 
> > and jhat  
> > >> to find
> > >> out what had eaten up the memory. I found millions of 
> > instances of  
> > >> String,
> > >> Term, and TermInfo. Why?
> > >>
> > >> In my application, I use a single IndexSearcher object, 
> which is  
> > >> shared by
> > >> all the requests. It is opened initially, and will never 
> be 

Is there a Term ID for each distinctive term indexed in Lucene?

2007-08-31 Thread Tao Cheng
Hi all,

I found that instead of storing a term ID for a term in the index, Lucene
stores the actual term string value. I am wondering if there ever is such a
"term ID" for each distinctive term indexed in Lucne, similar as a "doc ID"
for each distinctive document indexed in Lucene.

In other words, I am looking for a method like " int termId(string term)" in
the IndexReader, such that it can return my an ID given a term's string
value.

Thanks a lot in advance.

-Tao


instruct IndexDeletionPolicy to delete old commits after N minutes

2008-06-25 Thread Alex Cheng
hi,
what is the correct way to instruct the indexwriter to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to do file deletion. However, I am only getting the filenames,
and not absolute file names.

thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



IndexDeletionPolicy to delete after N minutes

2008-06-25 Thread Alex Cheng
hi,
what is the correct way to instruct the indexwriter to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to do file deletion. However, I am only getting the filenames,
and not absolute file names.

thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



IndexDeletionPolicy to delete commits after N minutes

2008-06-25 Thread Alex Cheng
hi,
what is the correct way to instruct the indexwriter (or other
classes?) to delete old
commit points after N minutes ?
I tried to write a customized IndexDeletionPolicy that uses the
parameters to schedule future
jobs to perform file deletion. However, I am only getting the
filenames through the parameters
and not absolute file names.

thanks.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How Lucene Search

2008-06-26 Thread Alex Cheng
the debugger that came with eclipse is pretty good for this purpose.
You can create a small project and then attach Lucene source for the
purpose of debugging.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene javadoc not up-to-date?

2007-05-28 Thread Tao Cheng

I've encountered a few discrepcies between the javadoc of Lucene and the
source code.
I use:
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/ as
the most up-to-date javadoc reference.
For instance, the SegmentTermDocs class implements the TermDocs interface.
However, there is no information about this SegmentTermDocs class in the
javadoc.
I notice such source codes (e.g. index/SegmentTermDocs.java) are not well
commented. Perhaps that's why they are not included in the javadoc
generation?
Thanks.

-Tao


RE: How to search

2008-08-26 Thread Jiao, Jason (NSN - CN/Cheng Du)
The lucene FAQ says:

What wildcard search support is available from Lucene?
Lucene supports wild card queries which allow you to perform searches
such as book*, which will find documents containing terms such as book,
bookstore, booklet, etc. Lucene refers to this type of a query as a
'prefix query'. 

Lucene also supports wild card queries which allow you to place a wild
card in the middle of the query term. For instance, you could make
searches like: mi*pelling. That will match both misspelling, which is
the correct way to spell this word, as well as mispelling, which is a
common spelling mistake. 

Another wild card character that you can use is '?', a question mark.
The ? will match a single character. This allows you to perform queries
such as Bra?il. Such a query will match both Brasil and Brazil. Lucene
refers to this type of a query as a 'wildcard query'. 

Leading wildcards (e.g. *ook) are not supported by the QueryParser by
default. As of Lucene 2.1, they can be enabled by calling
QueryParser.setAllowLeadingWildcard( true ). Note that this can be an
expensive operation: it requires scanning the list of tokens in the
index in its entirety to look for those that match the pattern. 



Br.
Jason Jiao


>-Original Message-
>From: ext Daniel Noll [mailto:[EMAIL PROTECTED] 
>Sent: Tuesday, August 26, 2008 10:50 AM
>To: java-user@lucene.apache.org
>Subject: Re: How to search
>
>Venkata Subbarayudu wrote:
>> Hi Anshum Gupta,
>> Thanks for your replay, but when I gone through 
>> querySyntax-Document for Lucene, I read that Lucene does not allow 
>> queries like "*findthis" i.e. I think it doesnot allow 
>wildcards in the beginning of the query.
>
>It has supported this for some time now, just not by default.
>
>Daniel
>
>--
>Daniel Noll
>
>-
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Luke issues "Unknown format version: -6"

2008-08-26 Thread Jiao, Jason (NSN - CN/Cheng Du)
Hi there,
I use luke v0.8.1 which build base on lucene 2.3.0. First, I run
lucene/demo/IndexFiles to build index successfully. Then I use luke to
open index, but luke  issues "Unknown format version: -6" . I check the
documentation of lucene which said "lucene 2.3.2 does not contain any
new features, API or file format changes, which makes it fully
compatible to 2.3.0 and 2.3.1".

Any hints?

Thanks in advance.


Jason Jiao

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]