[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

2009-03-02 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678045#action_12678045 ] Ning Li commented on LUCENE-1541: - An index size comparison will be great. > Tri

[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

2009-02-20 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675390#action_12675390 ] Ning Li commented on LUCENE-1541: - When one precision step is given, it is converte

[jira] Commented: (LUCENE-1541) Trie range - make trie range indexing more flexible

2009-02-19 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675212#action_12675212 ] Ning Li commented on LUCENE-1541: - If you are *really* concerned with the additional

[jira] Created: (LUCENE-1541) Trie range - make trie range indexing more flexible

2009-02-17 Thread Ning Li (JIRA)
Components: contrib/* Reporter: Ning Li Priority: Minor In the current trie range implementation, a single precision step is specified. With a large precision step (say 8), a value is indexed in fewer terms (8) but the number of terms for a range can be large. With a small

[jira] Commented: (LUCENE-1470) Add TrieRangeFilter to contrib

2009-02-17 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674248#action_12674248 ] Ning Li commented on LUCENE-1470: - Agree. Do you want to open a new issue? If you wan

[jira] Commented: (LUCENE-1470) Add TrieRangeFilter to contrib

2009-02-16 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674051#action_12674051 ] Ning Li commented on LUCENE-1470: - Hi Uwe, I had something similar in mind when I

[jira] Commented: (LUCENE-1470) Add TrieRangeFilter to contrib

2009-02-16 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673912#action_12673912 ] Ning Li commented on LUCENE-1470: - Good stuff! Is it worth to also have an optio

Re: 2.4 release candidate 1

2008-09-19 Thread Ning Li
LUCENE-1335 is not listed in CHANGES.txt? It also includes a minor behavior change: "no longer allow the same Directory to be passed into addIndexes* more than once". Cheers, Ning On Thu, Sep 18, 2008 at 2:29 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Hi, > > I just created the first

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Ning Li
>>> Even so, >>> this may not be sufficient for some FS such as HDFS... Is it >>> reasonable in this case to keep in memory everything including >>> stored fields and term vectors? >> >> We could maybe do something like a proxy IndexInput/IndexOutput that >> would allow updating the read buffer fro

Re: Realtime Search for Social Networks Collaboration

2008-09-09 Thread Ning Li
On Mon, Sep 8, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> I thought an index reader which supports real-time search no longer >> maintains a static view of an index? > > It seems advantageous to just make it really cheap to get a new view > of the index (if you do it for every sear

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Ning Li
On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > But, how would you maintain a static view of an index...? > > IndexReader r1 = indexWriter.getCurrentIndex() > indexWriter.addDocument(...) > IndexReader r2 = indexWriter.getCurrentIndex() > > I assume r1 will have a view of

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Ning Li
Hi, We experimented using HBase's scalable infrastructure to scale out Lucene: http://www.mail-archive.com/[EMAIL PROTECTED]/msg01143.html There is the concern on the impact of HDFS's random read performance on Lucene search performance. And we can discuss if HBase's architecture is best for scal

[jira] Commented: (LUCENE-532) [PATCH] Indexing on Hadoop distributed file system

2008-09-03 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628025#action_12628025 ] Ning Li commented on LUCENE-532: Is the use of seek and write in ChecksumIndexOu

Re: [jira] Commented: (LUCENE-1335) Correctly handle concurrent calls to addIndexes, optimize, commit

2008-08-29 Thread Ning Li
+1 On Thu, Aug 28, 2008 at 8:19 PM, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: > >[ > https://issues.apache.org/jira/browse/LUCENE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626805#action_12626805 > ] > > Michael McCandless commen

[jira] Commented: (LUCENE-1335) Correctly handle concurrent calls to addIndexes, optimize, commit

2008-08-27 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626158#action_12626158 ] Ning Li commented on LUCENE-1335: - Maybe this should be a separate JIRA issue. In do

[jira] Commented: (LUCENE-1335) Correctly handle concurrent calls to addIndexes, optimize, commit

2008-08-25 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625455#action_12625455 ] Ning Li commented on LUCENE-1335: - > I don't think so: with autoCommit=true

[jira] Commented: (LUCENE-1335) Correctly handle concurrent calls to addIndexes, optimize, commit

2008-08-23 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625078#action_12625078 ] Ning Li commented on LUCENE-1335: - > It's because commit() calls prepareCom

[jira] Commented: (LUCENE-1335) Correctly handle concurrent calls to addIndexes, optimize, commit

2008-08-22 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624998#action_12624998 ] Ning Li commented on LUCENE-1335: - I agree that we should not make any API promises a

[jira] Commented: (LUCENE-1335) Correctly handle concurrent calls to addIndexes, optimize, commit

2008-08-22 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624851#action_12624851 ] Ning Li commented on LUCENE-1335: - Hi Mike, could you update the patch? I cannot appl

[jira] Resolved: (LUCENE-1338) With non-deprecated constructors, IndexWriter's autoCommit is always true

2008-07-17 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li resolved LUCENE-1338. - Resolution: Invalid When deprecated constructors are removed in 3.0, autoCommit will always be false

[jira] Commented: (LUCENE-1338) With non-deprecated constructors, IndexWriter's autoCommit is always true

2008-07-17 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614404#action_12614404 ] Ning Li commented on LUCENE-1338: - Or is the intention to make autoCommit always f

[jira] Created: (LUCENE-1338) With non-deprecated constructors, IndexWriter's autoCommit is always true

2008-07-17 Thread Ning Li (JIRA)
Java Issue Type: Bug Components: Index Reporter: Ning Li Priority: Minor With non-deprecated constructors, IndexWriter's autoCommit is always true. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to

Re: Commit while addIndexes is in progress

2008-07-11 Thread Ning Li
think there're similar problems with calling optimize() while addIndexes > is in progress... I think we should disallow that? Optimize waits for addIndexes to finish? I think it's useful to allow addIndexes during maybeMerge and optimize, no? Cheers, Ning Li ---

Commit while addIndexes is in progress

2008-07-11 Thread Ning Li
Hi, Should we guard against the case when commit() is called during addIndexes? Otherwise, errors such as a file does not exist could happen during commit. Cheers, Ning Li - To unsubscribe, e-mail: [EMAIL PROTECTED] For

[jira] Commented: (LUCENE-1228) IndexWriter.commit() does not update the index version

2008-03-13 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578518#action_12578518 ] Ning Li commented on LUCENE-1228: - Does SegmentInfos really need both "ver

[jira] Updated: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2008-03-11 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-1035: Attachment: LUCENE-1035.contrib.patch Re-do as a contrib package. Creating BufferPooledDirectory with

[jira] Commented: (LUCENE-1204) IndexWriter.deleteDocuments bug

2008-03-06 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575782#action_12575782 ] Ning Li commented on LUCENE-1204: - > I think this is a false alarm. I just found

[jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2008-03-03 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574782#action_12574782 ] Ning Li commented on LUCENE-1035: - > It looks like this was never fully done. I wo

[jira] Commented: (LUCENE-1194) Add deleteByQuery to IndexWriter

2008-02-27 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572957#action_12572957 ] Ning Li commented on LUCENE-1194: - > As of LUCENE-1044, when autoCommit=true, Inde

[jira] Commented: (LUCENE-1194) Add deleteByQuery to IndexWriter

2008-02-26 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572576#action_12572576 ] Ning Li commented on LUCENE-1194: - Great to see deleteByQuery being added to IndexWr

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
One main focus is to provide fault-tolerance in this distributed index system. Correct me if I'm wrong, I think SOLR-303 is focusing on merging results from multiple shards right now. We'd like to start an open source project for a fault-tolerant distributed index system (or join if one already exi

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
No. I'm curious too. :) On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote: > I assume that Google also has distributed index over their > GFS/MapReduce implementation. Any idea how they achieve this? > > J.D. >

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
I work for IBM Research. I read the Rackspace article. Rackspace's Mailtrust has a similar design. Happy to see an existing application on such a system. Do they plan to open-source it? Is the AOL project an open source project? On Feb 6, 2008 11:33 AM, Clay Webster <[EMAIL PROTECTED]> wrote: > >

Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
HDFS block. This feature may be useful for other HDFS applications (e.g., HBase). We would like to collaborate with other people who are interested in adding this feature to HDFS. Regards, Ning Li

Re: Per-document Payloads

2007-10-30 Thread Ning Li
> That may be a little too seamless. We want the user to have specific > control over which fields are efficiently stored separately since they > will know how that field will be used. Maybe let users decide field families, like the column families in BigTable? --

[jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2007-10-29 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538638 ] Ning Li commented on LUCENE-1035: - > The question is whether such situations are common enough to warrant add

[jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2007-10-26 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538129 ] Ning Li commented on LUCENE-1035: - > That seems like quite a few docs to retrieve--any particular reason why?

[jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2007-10-26 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538112 ] Ning Li commented on LUCENE-1035: - > I'll change to "OR" queries and see what happens. Quer

[jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2007-10-26 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537995 ] Ning Li commented on LUCENE-1035: - > most lucene usecases store much more than just the document id... that wo

[jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2007-10-26 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537978 ] Ning Li commented on LUCENE-1035: - > Were the tests run using the same set of queries they were warmed for?

[jira] Commented: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2007-10-26 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537972 ] Ning Li commented on LUCENE-1035: - > I don't think this is any better than the NIOFileCache directo

[jira] Updated: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance

2007-10-25 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-1035: Summary: Optional Buffer Pool to Improve Search Performance (was: ptional Buffer Pool to Improve Search

[jira] Updated: (LUCENE-1035) ptional Buffer Pool to Improve Search Performance

2007-10-25 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-1035: Lucene Fields: [Patch Available] (was: [New]) > ptional Buffer Pool to Improve Search Performa

[jira] Updated: (LUCENE-1035) ptional Buffer Pool to Improve Search Performance

2007-10-25 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-1035: Attachment: LUCENE-1035.patch Coding Changes -- New classes are localized to the store

[jira] Created: (LUCENE-1035) ptional Buffer Pool to Improve Search Performance

2007-10-25 Thread Ning Li (JIRA)
: Store Reporter: Ning Li Index in RAMDirectory provides better performance over that in FSDirectory. But many indexes cannot fit in memory or applications cannot afford to spend that much memory on index. On the other hand, because of locality, a reasonably sized buffer pool may

Re: lucene indexing and merge process

2007-10-18 Thread Ning Li
lt set is large. But loading it in > memory when opening index can also be slow if the index is large and updates > often. > > Thanks > > -John > > On 10/18/07, Ning Li <[EMAIL PROTECTED]> wrote: > > > > Make all documents have a term, say "ID:UID",

Re: lucene indexing and merge process

2007-10-18 Thread Ning Li
Make all documents have a term, say "ID:UID", and for each document, store its UID in the term's payload. You can read off this posting list to create your array. Will this work for you, John? Cheers, Ning On 10/18/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > Forwarding this to java-dev per req

Re: Exceptions in TestConcurrentMergeScheduler

2007-10-03 Thread Ning Li
The cause is that in MergeThread.run(), merge in the try block is a local variable, while merge in the catch block is the class variable. Merge in the try block could be one different from the original merge, but the catch block always checks the abort flag of the original merge. -

[jira] Commented: (LUCENE-1007) Flexibility to turn on/off any flush triggers

2007-10-01 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531513 ] Ning Li commented on LUCENE-1007: - One more thing about the approximation of actual bytes used for buffered delete

[jira] Updated: (LUCENE-1007) Flexibility to turn on/off any flush triggers

2007-09-27 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-1007: Attachment: LUCENE-1007.take2.patch Take2 counts buffered delete terms towards ram buffer used. A test

[jira] Updated: (LUCENE-1007) Flexibility to turn on/off any flush triggers

2007-09-27 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-1007: Attachment: LUCENE-1007.patch Just got around to do the patch: - The patch includes changes to

[jira] Created: (LUCENE-1007) Flexibility to turn on/off any flush triggers

2007-09-27 Thread Ning Li (JIRA)
Reporter: Ning Li Priority: Minor See discussion at http://www.gossamer-threads.com/lists/lucene/java-dev/53186 Provide the flexibility to turn on/off any flush triggers - ramBufferSize, maxBufferedDocs and maxBufferedDeleteTerms. One of ramBufferSize and maxBufferedDocs

Re: setRAMBufferSizeMB vs. setMaxBufferedDocs

2007-09-24 Thread Ning Li
On 9/24/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > On flushing pending deletes by RAM usage: should we just bundle this > up under "flush by RAM usage"? Ie "when total RAM usage, either from > buffered deletes, buffered docs, anything else, exceeds X then it's > time to flush"? (Instead

Re: setRAMBufferSizeMB vs. setMaxBufferedDocs

2007-09-24 Thread Ning Li
to max int MB. Ning On 9/24/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > > "Doron Cohen" <[EMAIL PROTECTED]> wrote: > > Hi Ning, > > > > "Ning Li" <[EMAIL PROTECTED]> wrote on 24/09/2007 00:26:36: > > > > > Do y

Re: setRAMBufferSizeMB vs. setMaxBufferedDocs

2007-09-23 Thread Ning Li
Hi Doron, > On the other, the logic of "use memory-limit unless added-docs-limit was > specified" seems somewhat confusing The design intention is to use either maxBufferedDocs/maxBufferedDeleteTerms or ramBufferSize, but not both at the same time. > (why only by pending adds, why not also by pe

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527286 ] Ning Li commented on LUCENE-847: > This was actually intentional: I thought it fine if the application is > s

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527239 ] Ning Li commented on LUCENE-847: Hmm, it's actually possible to have concurrent merges with SerialMergeSche

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-13 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527224 ] Ning Li commented on LUCENE-847: Access of mergeThreads in ConcurrentMergeScheduler.merge() should be synchronized

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-11 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526628 ] Ning Li commented on LUCENE-847: > OK, another rev of the patch (take6). I think it's close! Yes, it

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-09 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526029 ] Ning Li commented on LUCENE-847: Comments on optimize(): - In the while loop of optimize

[jira] Commented: (LUCENE-992) IndexWriter.updateDocument is no longer atomic

2007-09-05 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525271 ] Ning Li commented on LUCENE-992: The patch looks good! A few comments and/or observations: - addDocument(Document

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-31 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524084 ] Ning Li commented on LUCENE-847: > Not quite following you here... not being eligible because the merge >

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-30 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523957 ] Ning Li commented on LUCENE-847: > True, but I was thinking CMPW could be an exception to this rule. I > g

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-29 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523621 ] Ning Li commented on LUCENE-847: I include comments for both LUCENE-847 and LUCENE-870 here since they are closely

Re: [jira] Updated: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-27 Thread Ning Li
Hi Mike, I cannot apply the patch cleanly. MergePolicy.java, e.g., seems to be missing from the patch. On 8/24/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

[jira] Updated: (LUCENE-987) Deprecate IndexModifier

2007-08-22 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-987: --- Attachment: deprecateIndexModifier.patch > Deprecate IndexModif

[jira] Created: (LUCENE-987) Deprecate IndexModifier

2007-08-22 Thread Ning Li (JIRA)
Deprecate IndexModifier --- Key: LUCENE-987 URL: https://issues.apache.org/jira/browse/LUCENE-987 Project: Lucene - Java Issue Type: Test Components: Index Reporter: Ning Li Priority

[jira] Updated: (LUCENE-978) GC resources in TermInfosReader when exception occurs in its constructor

2007-08-16 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-978: --- Attachment: Readers.patch Similar fixes are added for FieldsReader and TermVectorsReader as well. >

[jira] Commented: (LUCENE-978) GC resources in TermInfosReader when exception occurs in its constructor

2007-08-16 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520286 ] Ning Li commented on LUCENE-978: > Agreed. Actually, it also looks like we need to do something similar

[jira] Updated: (LUCENE-978) GC resources in TermInfosReader when exception occurs in its constructor

2007-08-16 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-978: --- Lucene Fields: [Patch Available] (was: [New]) > GC resources in TermInfosReader when exception occurs

[jira] Updated: (LUCENE-978) GC resources in TermInfosReader when exception occurs in its constructor

2007-08-15 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-978: --- Attachment: TermInfosReader.patch > GC resources in TermInfosReader when exception occurs in its construc

[jira] Created: (LUCENE-978) GC resources in TermInfosReader when exception occurs in its constructor

2007-08-15 Thread Ning Li (JIRA)
Issue Type: Bug Components: Index Reporter: Ning Li Priority: Minor Attachments: TermInfosReader.patch I replaced IndexModifier with IndexWriter in test case TestStressIndexing and noticed the test failed from time to time because some .tis file is still

Re: Deprecating IndexModifier

2007-08-12 Thread Ning Li
IndexWriter does everything IndexModifier does and more, except "deleteDocument(int doc)". Can we reach consensus on: 1 Should we deprecate IndexModifier before 3.0 and remove it in 3.0? 2 If so, do we have to add "deleteDocument(int doc)" to IndexWriter? We know how to support "deleteDocument(int

Re: Deprecating IndexModifier

2007-08-08 Thread Ning Li
On 8/8/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 8/8/07, Ning Li <[EMAIL PROTECTED]> wrote: > > This reminds me: It'd be nice if we could support delete-by-query someday. > > :) > > > > I was thinking people use deleteDocument(int docid) whe

Re: Deprecating IndexModifier

2007-08-08 Thread Ning Li
On 8/8/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > Let's take a simple case of deleting documents in a range, like > date:[2006 TO 2008] > One would currently need to close the writer and open a new reader to > ensure that they can "see" all the documents. Then execute a > RangeQuery, collect th

Re: Deprecating IndexModifier

2007-08-08 Thread Ning Li
On 8/8/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 8/8/07, Ning Li <[EMAIL PROTECTED]> wrote: > > But you still think it's worth to be included in IndexWriter, right? > > I'm not sure... (unless I'm missing some obvious use-cases). > If one could g

Re: Deprecating IndexModifier

2007-08-08 Thread Ning Li
On 8/8/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > To make delete by docid useful, one needs a way to *get* those docids. > A callback after flush that provided acurrent list of readers for the > segments would serve. Interesting. That makes sense. > I think IndexWriter.deleteDocument(int doc)

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-08 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518520 ] Ning Li commented on LUCENE-847: > Furthermore, I think this is all contained within IndexWriter, right? > I

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-08 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518486 ] Ning Li commented on LUCENE-847: The following comments are about the impact on merge if we add "deleteDocumen

Re: Deprecating IndexModifier

2007-08-08 Thread Ning Li
ffered delete doc ids. I don't think it should be the reason not to support "deleteDocument(int doc)" in IndexWriter. But its impact on concurrent merge is a concern. Ning On 8/7/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > +1 > > > On Aug 7, 2007, at 3:37 PM, Nin

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-08 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518453 ] Ning Li commented on LUCENE-847: On 8/8/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: > Actua

Re: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-08-08 Thread Ning Li
On 8/7/07, Steven Parkes (JIRA) <[EMAIL PROTECTED]> wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518210 > ] > > Steven Parkes commented on LUCENE-847: > -- > >

Deprecating IndexModifier

2007-08-07 Thread Ning Li
With the plan towards 3.0 release laid out, I think it's a good time to deprecate IndexModifier and eventually remove IndexModifier. The only method in IndexModifier which is not implemented in IndexWriter is "deleteDocument(int doc)". This is because of the concern that document ids are changing

[jira] Commented: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-07-12 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512271 ] Ning Li commented on LUCENE-938: I didn't make myself clear. Let me try again. The patch includes two par

[jira] Commented: (LUCENE-938) I/O exceptions can cause loss of buffered deletes

2007-07-05 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510422 ] Ning Li commented on LUCENE-938: Good catch, Steven! One thing though: I thought we had assumed that there wouldn&#

Re: [jira] Created: (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize

2007-05-03 Thread Ning Li
Steve, Mike, Thanks for the explanation! I meant cascading but wrote optimizing. So it still cascades merges. It would merge based on size (not # docs), would be free to merge adjacent segments (not just rightmost segments), and would merge N (configurable) at a time. The part that's still unc

Re: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-05-03 Thread Ning Li
Having the merge policy own segmentInfos sounds kind of hard to me. Among other things, there's a lot of code in IndexWriter for managing segmentInfos with regards to transactions. I'm pretty wary of touching that code. Is there a way around that? But conceptually, do you agree it's a good idea

Re: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-05-02 Thread Ning Li
On 3/23/07, Steven Parkes (JIRA) <[EMAIL PROTECTED]> wrote: In fact, there a few things here that are fairly subtle/important. The relationship/protocol between the writer and policy is pretty strong. This can be seen in the strawman concurrent merge code where the merge policy holds state and

Re: [jira] Created: (LUCENE-854) Create merge policy that doesn't periodically inadvertently optimize

2007-05-02 Thread Ning Li
On 3/31/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: Create merge policy that doesn't periodically inadvertently optimize So we could make a small change to the policy by only merging the first mergeFactor segments o

Re: [jira] Created: (LUCENE-856) Optimize segment merging

2007-04-04 Thread Ning Li
On 4/4/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: Note that for "autoCommit=false", this optimization is somewhat less important, depending on how often you actually close/open a new IndexWriter. In the extreme case, if you open a writer, add 100 MM docs, close the writer, then no

Re: [jira] Commented: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-04-03 Thread Ning Li
On 4/3/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: * With term vectors and/or stored fields, the new patch has substantially better RAM efficiency. Impressive numbers! The new patch improves RAM efficiency quite a bit even with no term vectors nor stored fields, because of the

Re: Concurrent merge

2007-03-29 Thread Ning Li
FYI: Patch submitted in http://issues.apache.org/jira/browse/LUCENE-847. Cheers, Ning "Here is a patch for concurrent merge as discussed in: http://www.gossamer-threads.com/lists/lucene/java-dev/45651?search_string=concurrent%20merge;#45651 "I put it under this issue because it helps design and

Re: [jira] Created: (LUCENE-851) Pruning

2007-03-29 Thread Ning Li
It will be great to support early termination for top-K queries within the DAAT query processing model in Lucene. There is quite some work published in related areas. http://portal.acm.org/citation.cfm?id=956944 is one of them. Am I getting it right? If a query requires top-K results, isn't it su

[jira] Updated: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-28 Thread Ning Li (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-847: --- Attachment: concurrentMerge.patch Here is a patch for concurrent merge as discussed in: http://www.gossamer

Re: [jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

2007-03-26 Thread Ning Li
On 3/26/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: Ahhh, this is a very good point. OK I won't deprecate "flushing by doc count" and instead will allow either "flush by RAM usage" (default to this?) or "flush by doc count". Just want to clarify: It's either "flush and merge by by

Re: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-03-23 Thread Ning Li
Hi Steven, I haven't read the details, but should maxBufferedDocs be exposed in some subinterfaces instead of the MergePolicy interface? Since some policies may use it and others may use byte size or something else. It's great that you've started on concurrent merge as well! I haven't got a chan

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Ning Li
On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote: Yes the code re-computes the level of a given segment from the current values of maxBufferedDocs & mergeFactor. But when these values have changed (or, segments were flushed by RAM not by maxBufferedDocs) then the way it computes level no

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Ning Li
On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote: Right I'm calling a newly created segment (ie flushed from RAM) level 0 and then a level 1 segment is created when you merge 10 level 0 segments, level 2 is created when merge 10 level 1 segments, etc. That is not how the current merge p

Re: Concurrent merge

2007-03-02 Thread Ning Li
Many good points! Thanks, guys! When background merge is employed, document additions can out-pace merging, no matter how many background merge threads are used. Blocking has to happen at some point. So, if we do anything, we make it simple. I agree with what Robert and Yonik have proposed: docu

Re: [jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely

2007-02-21 Thread Ning Li
On 2/21/07, Doron Cohen (JIRA) <[EMAIL PROTECTED]> wrote: Imagine the application and Lucene could talk, with the current implementation we could hear something like this: ... However, there could be multiple threads updating the same index. For example, thread 1 deletes the term "id:5" twice,

  1   2   3   >