Re: official GIT repository / switch to GIT?

2010-04-17 Thread John Wang
Hi Thomas: There is a git mirror already: http://github.com/apache/lucene All of apache projects are: http://git.apache.org/ You are free to use git. Apache is running a git-svn server somewhere, although the repository itself is not git, but you can use it as one. Hope this help

[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856913#action_12856913 ] John Wang commented on LUCENE-2159: --- Yeah, that sounds great! I will need to learn

[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856908#action_12856908 ] John Wang commented on LUCENE-2159: --- Shai: I am just stating our experiences. I am

[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856869#action_12856869 ] John Wang commented on LUCENE-2159: --- Shai: You are right, we found this

Re: Google-developed posting list encoding

2010-04-14 Thread John Wang
This would be something that's excellent for contribution after the Flex-Indexing support is added. -John On Wed, Apr 14, 2010 at 12:22 AM, Mike Klaas wrote: > Can be quite a bit faster than vInt in some cases: > http://www.ir.uwaterloo.ca/book/addenda-06-index-compression.html > > -Mike > > --

Re: FindBugs Community Review of Lucene

2010-04-13 Thread John Wang
Hi Nat: Great analysis! Some of them DO seem to be bugs! Maybe findbugs can be enabled as part of the build? -John On Tue, Apr 13, 2010 at 11:01 AM, Nat Ayewah wrote: > Hello, > > I am a PhD student working with the FindBugs project, at the University of > Maryland. FindBugs

Re: chinese stopwords

2010-04-10 Thread John Wang
Awesome, thanks! Great job of the work! -John 2010/4/10 Gao Pinker > That's a good idea, I'll think about adding another stopword-list to let > users have a chance to choose. > > > On Sat, Apr 10, 2010 at 9:25 PM, John Wang wrote: > >> Yeah, I found so

Re: chinese stopwords

2010-04-10 Thread John Wang
blog/item/146b5c346a738c4d251f1496.html > http://download.csdn.net/source/740407 > > > On Sat, Apr 10, 2010 at 9:59 AM, John Wang wrote: > >> Hi: >> >>I am using the SmartChineseAnalyzer class and it is great! >> >>Was wondering if we should have a set o

chinese stopwords

2010-04-09 Thread John Wang
Hi: I am using the SmartChineseAnalyzer class and it is great! Was wondering if we should have a set of chinese stopwords. The default set containts only punctuations. Thanks -John

[jira] Commented: (LUCENE-2252) stored field retrieve slow

2010-03-23 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848972#action_12848972 ] John Wang commented on LUCENE-2252: --- Hi Mike: Sorry for the late reply. We

[jira] Commented: (LUCENE-2252) stored field retrieve slow

2010-02-06 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830641#action_12830641 ] John Wang commented on LUCENE-2252: --- bq. I still think 4 bytes/doc is too much (its

[jira] Commented: (LUCENE-2252) stored field retrieve slow

2010-02-06 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830628#action_12830628 ] John Wang commented on LUCENE-2252: --- Sorry, I meant LUCENE-1914 > store

[jira] Commented: (LUCENE-2252) stored field retrieve slow

2010-02-06 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830627#action_12830627 ] John Wang commented on LUCENE-2252: --- bq. I do not understand, I think the fdx inde

[jira] Commented: (LUCENE-2252) stored field retrieve slow

2010-02-06 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830599#action_12830599 ] John Wang commented on LUCENE-2252: --- Thanks Uwe for the pointer. Will check that

[jira] Created: (LUCENE-2252) stored field retrieve slow

2010-02-06 Thread John Wang (JIRA)
Reporter: John Wang IndexReader.document() on a stored field is rather slow. Did a simple multi-threaded test and profiled it: 40+% time is spent in getting the offset from the index file 30+% time is spent in reading the count (e.g. number of fields to load) Although I ran it on my lap top

Re: NRT and IndexSearcher performance

2010-01-19 Thread John Wang
I think the question here really is the cost of creating new IndexReader instances per query. Calling IndexWriter.getReader() for each query has shown to be expensive from our benchmark and previous discussions. -John On Tue, Jan 19, 2010 at 8:12 PM, Jason Rutherglen < jason.rutherg...@gmail.com

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-13 Thread John Wang
"NRT reader "simply" lets you search the full index, including un-committed changes." I am not sure I understand: I think the context of the discussion is for when the indexer crashes before IW.commit. At which point, does not really matter if you are using NRT, e.g. IW.getReader, or IndexReader.

Re: Compound File Default

2010-01-13 Thread John Wang
+1. -John On Tue, Jan 12, 2010 at 8:16 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Heh, yeah, I forgot about that. Pick the lesser evil? I like speedier > defaults. > > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > - Original Message >

[jira] Commented: (LUCENE-2120) Possible file handle leak in near real-time reader

2009-12-28 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794998#action_12794998 ] John Wang commented on LUCENE-2120: --- Hi Michael: You are abs. right! By addin

Re: TermDocs.close

2009-12-27 Thread John Wang
entTermDocs/Positions use are clones, their close methods are a > no-op. > > Mike > > On Sun, Dec 27, 2009 at 6:37 AM, John Wang wrote: > > Hi: > > I see TermDocs.close not being called when created with TermQuery: > > TermQuery creates it and passes to TermScorer

TermDocs.close

2009-12-27 Thread John Wang
Hi: I see TermDocs.close not being called when created with TermQuery: TermQuery creates it and passes to TermScorer, and is never closed. I see TermDocs.close actually closes the input stream. Is it safe not closing TermDocs? Thanks -John

[jira] Commented: (LUCENE-2120) Possible file handle leak in near real-time reader

2009-12-26 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794618#action_12794618 ] John Wang commented on LUCENE-2120: --- I realized in my ArrayDocIdSet.skip, i am d

[jira] Commented: (LUCENE-2120) Possible file handle leak in near real-time reader

2009-12-26 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794617#action_12794617 ] John Wang commented on LUCENE-2120: --- Michael: I wrote a little test to measure

[jira] Commented: (LUCENE-2120) Possible file handle leak in near real-time reader

2009-12-22 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793803#action_12793803 ] John Wang commented on LUCENE-2120: --- Yes we have done perf tests. We see no inde

Re: 3.0 api change

2009-12-22 Thread John Wang
chindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > -- > > *From:* John Wang [mailto:john.w...@gmail.com] > *Sent:* Tuesday, December 22, 2009 3:16 AM > *To:* java-u...@lucene.apache.org; java-dev@lucen

Fwd: 3.0 api change

2009-12-21 Thread John Wang
Any comments? Did we just unintentionally remove getFieldComparatorSource in 3.0.0? -John -- Forwarded message -- From: John Wang Date: Mon, Dec 21, 2009 at 11:21 AM Subject: 3.0 api change To: Lucene Users List , lucene-...@jakarta.apache.org Hi guys: I noticed

[jira] Commented: (LUCENE-2160) Tool to rename a field

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791051#action_12791051 ] John Wang commented on LUCENE-2160: --- Looked at the file format wiki more closely, I

[jira] Commented: (LUCENE-2160) Tool to rename a field

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791046#action_12791046 ] John Wang commented on LUCENE-2160: --- Did some more digging around the issue on f

[jira] Commented: (LUCENE-2120) Possible file handle leak in near real-time reader

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790985#action_12790985 ] John Wang commented on LUCENE-2120: --- bq. is this what the private static

[jira] Updated: (LUCENE-2160) Tool to rename a field

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-2160: -- Attachment: RenameField.java Fixed a problem with cfs files. > Tool to rename a fi

[jira] Commented: (LUCENE-2160) Tool to rename a field

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790809#action_12790809 ] John Wang commented on LUCENE-2160: --- Just did a test: You are r

[jira] Commented: (LUCENE-2160) Tool to rename a field

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790797#action_12790797 ] John Wang commented on LUCENE-2160: --- Good point. But do you ever sort across fi

[jira] Closed: (LUCENE-2007) Add DocsetQuery to turn a DocIdSet into a query

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang closed LUCENE-2007. - Resolution: Won't Fix > Add DocsetQuery to turn a DocIdSet into

[jira] Updated: (LUCENE-2160) Tool to rename a field

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-2160: -- Attachment: RenameField.java part of the code was originally posted on nabble, but is not removed

[jira] Created: (LUCENE-2160) Tool to rename a field

2009-12-15 Thread John Wang (JIRA)
Reporter: John Wang We found it useful to be able to rename a field. It can save a lot of reindexing time/cost when being used in conjunction with ParallelReader to update partially a field. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the

[jira] Updated: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2009-12-15 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-2159: -- Attachment: ExpandIndex.java I have put it under contrib/misc, in package org.apache.lucene.index

[jira] Created: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2009-12-15 Thread John Wang (JIRA)
: contrib/* Affects Versions: 3.0 Reporter: John Wang Sometimes it is useful to take a small-ish index and expand it into a large index with K segments for perf/stress testing. This tool does that. See attached class. -- This message is automatically generated by JIRA. - You

[jira] Commented: (LUCENE-2120) Possible file handle leak in near real-time reader

2009-12-14 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790506#action_12790506 ] John Wang commented on LUCENE-2120: --- Hi Michael: bq: Why does Zoie even reta

[jira] Commented: (LUCENE-1613) TermEnum.docFreq() is not updated with there are deletes

2009-12-06 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786750#action_12786750 ] John Wang commented on LUCENE-1613: --- Maybe to just add a javadoc comment on the cal

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-12-05 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786408#action_12786408 ] John Wang commented on LUCENE-1526: --- Yes, we still see the issue. The perform

Re: A new Lucene Directory available

2009-11-14 Thread John Wang
HI Sanne: Very interesting! What kinda performance should we expect with this, comparing to regular FSDIrectory on local HD. Thanks -John On Sat, Nov 14, 2009 at 11:44 AM, Sanne Grinovero < s.grinov...@sourcesense.com> wrote: > Hello all, > I'm a Lucene user and fan, I wanted to tell

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-11 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776739#action_12776739 ] John Wang commented on LUCENE-1526: --- Correction: We are NOT using BalancedMergePo

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-10 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776282#action_12776282 ] John Wang commented on LUCENE-1526: --- wrote a little pgm on my mac pro (8 core 16GM

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-10 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776085#action_12776085 ] John Wang commented on LUCENE-1526: --- bq. Zoie will take 64 msec longer than Lu

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-10 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775930#action_12775930 ] John Wang commented on LUCENE-1526: --- bq. we need to see it in the real-world con

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-09 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775319#action_12775319 ] John Wang commented on LUCENE-1526: --- bq. I'd love to see how the worst-cas

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-08 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774743#action_12774743 ] John Wang commented on LUCENE-1526: --- Michael: I think I confused you by not gi

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-07 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774671#action_12774671 ] John Wang commented on LUCENE-1526: --- We do not hold the deleted set for a long pe

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

2009-11-07 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774662#action_12774662 ] John Wang commented on LUCENE-1526: --- The issue of not using a BitSet/BitVector is

[jira] Created: (LUCENE-2033) exposed MultiTermDocs and MultiTermPositions from package protected to public

2009-11-04 Thread John Wang (JIRA)
Issue Type: Improvement Components: Search Affects Versions: 2.9 Reporter: John Wang making these classes public can help classes that extends MultiReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to

[jira] Commented: (LUCENE-2026) Refactoring of IndexWriter

2009-11-03 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773329#action_12773329 ] John Wang commented on LUCENE-2026: --- +1 > Refactoring of Inde

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-11-02 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772877#action_12772877 ] John Wang commented on LUCENE-1997: --- Another observation, with multiQ approach, s

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-11-02 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772794#action_12772794 ] John Wang commented on LUCENE-1997: --- Mark: The point of discussion is me

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-11-02 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772790#action_12772790 ] John Wang commented on LUCENE-1997: --- Mark: 100th page at the same time inde

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-11-02 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772764#action_12772764 ] John Wang commented on LUCENE-1997: --- I just looked at the most recent patch. E

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-11-02 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772754#action_12772754 ] John Wang commented on LUCENE-1997: --- Hi Michael: Thanks for the heads up. I

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-11-02 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772710#action_12772710 ] John Wang commented on LUCENE-1997: --- Hi Michael: Any plans/decisions on mo

[jira] Commented: (LUCENE-2007) Add DocsetQuery to turn a DocIdSet into a query

2009-10-24 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769674#action_12769674 ] John Wang commented on LUCENE-2007: --- Both Paul and Uwe are absolutely correct! I

[jira] Updated: (LUCENE-2007) Add DocsetQuery to turn a DocIdSet into a query

2009-10-24 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-2007: -- Attachment: LUCENE-2007-2.patch fixed to use reader.termDocs(null) for delete check. >

[jira] Updated: (LUCENE-2007) Add DocsetQuery to turn a DocIdSet into a query

2009-10-24 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-2007: -- Attachment: LUCENE-2007.patch contributed from bobo. Still work needed: 1) reader.isDeleted is now

[jira] Created: (LUCENE-2007) Add DocsetQuery to turn a DocIdSet into a query

2009-10-24 Thread John Wang (JIRA)
: Search Reporter: John Wang Added a class DocsetQuery that can be constructed from a DocIdSet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread John Wang
Hi Mike: Thank you! It would be really nice to get the optimizations you have done. -John 2009/10/23 Michael McCandless > Agreed: so far I'm seeing serious performance loss with MultiPQ, > especially as topN gets larger, and for int sorting. > > For small queue, String sort, it sometimes wi

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-23 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769119#action_12769119 ] John Wang commented on LUCENE-1997: --- wrote a small test and verified that 64bit

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-23 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769116#action_12769116 ] John Wang commented on LUCENE-1997: --- I think I found the reason for the discrepancy

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-22 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769090#action_12769090 ] John Wang commented on LUCENE-1997: --- bq: topn:100 I had made changes to sortBench.p

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
Hi Yonik: I have been head deep in this trying to find out a good solution for better part of the past two days, it's been hard because there are so many variables: 1) how optimized are the code from either of the implementations 2) VM difference 3) HW etc. Also, there are quite a few dim

[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

2009-10-22 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769045#action_12769045 ] John Wang commented on LUCENE-1997: --- My machine HW spec: Model Name: MacBook

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
eeley wrote: > On Thu, Oct 22, 2009 at 10:35 PM, John Wang wrote: > >Please be patient with me. I am seeing a difference and was > wondering > > if Mike would see the same thing. > > Some differences are bound to be seen... with your changes (JVM > changes, branc

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
e more over the weekend for sure. > > > > -jake > > > > On Thu, Oct 22, 2009 at 7:29 PM, Mark Miller > <mailto:markrmil...@gmail.com>> wrote: > > > > Why? What might he find? Whats with the cryptic request? > > > > Why would Java

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
0|rand int|25|113.77|112.92|{color:red}-0.7%{color}| |log||100|rand int|50|113.36|109.56|{color:red}-3.4%{color}| |log||100|rand int|500|103.90|66.29|{color:red}-36.2%{color}| |log||100|rand int|1000|89.52|70.67|{color:red}-21.1%{color}| On Thu, Oct 22, 2009 at 7:43 PM, John Wang

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
just makes 0 sense to me and I > said as much. > > John Wang wrote: > > Mark: > > > >Please be patient with me. I am seeing a difference and was > > wondering if Mike would see the same thing. I thought Michael would be > > willing to because he e

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
t; I know point 2 certainly doesn't. Cards on the table? > > John Wang wrote: > > Hey Michael: > > > >Would you mind rerunning the test you have with jdk1.5? > > > >Also, if you would, change the comparator method to avoid > > brachning

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
, Michael McCandless < luc...@mikemccandless.com> wrote: > On Thu, Oct 22, 2009 at 2:17 AM, John Wang wrote: > > > I have been playing with the patch, and I think I have some > information > > that you might like. > > Let me spend sometime and gather some

Re: lucene 2.9 sorting algorithm

2009-10-21 Thread John Wang
I lot though, no? -John On Wed, Oct 21, 2009 at 3:11 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Tue, Oct 20, 2009 at 11:55 AM, John Wang wrote: > > > the simpler api places less restriction on the type of custom > > sorting that can be done. > >

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
rk, plus a simple python wrapper to run old/new tests > across different queries, sort, topN, etc. > > But I got different results... MultiPQ looks generally slower than > SinglePQ. So I think we now need to reconcile what's different > between our tests. > > Mike > >

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Sorry, mistyped again, we have a multivalued field of STRINGS, no integers. -John On Tue, Oct 20, 2009 at 8:55 AM, John Wang wrote: > Hi guys: > I am not suggesting just simply changing the deprecated signatures. > There are some work to be done of course. In the beginning of the t

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Hi guys: I am not suggesting just simply changing the deprecated signatures. There are some work to be done of course. In the beginning of the thread, we discussed two algorithms (both handling per-segment field loading), and at the conclusion, (to be still verified by Mike) that both algorithm

Re: lucene 2.9 sorting algorithm

2009-10-19 Thread John Wang
, Michael McCandless < luc...@mikemccandless.com> wrote: > Oh, no problem... > > Mike > > On Fri, Oct 16, 2009 at 12:33 PM, John Wang wrote: > > Mike, just a clarification on my first perf report email. > > The first section, numHits is incorrectly labeled, it should

Re: 2.9.1

2009-10-18 Thread John Wang
ah! Thanks Yonik! -John On Sun, Oct 18, 2009 at 6:32 AM, Yonik Seeley wrote: > On Sun, Oct 18, 2009 at 1:43 AM, John Wang wrote: > > Maybe it is not a big deal. But I would still like to know why in > > MultiTermDocs, if term is not null, termDocs(term) is not called, rath

Re: 2.9.1

2009-10-17 Thread John Wang
Hi guys: Maybe it is not a big deal. But I would still like to know why in MultiTermDocs, if term is not null, termDocs(term) is not called, rather termDocs() is. Thanks -John On Sat, Oct 17, 2009 at 10:16 AM, John Wang wrote: > Oh ok. I was thinking that if term is not null, termD

Re: 2.9.1

2009-10-17 Thread John Wang
ke > > On Sat, Oct 17, 2009 at 1:09 PM, John Wang wrote: > > In DirectoryReader$MultiTermDocs implementation: > > in method: protected TermDocs termDocs(IndexReader reader) > > return term==null ? reader.termDocs(null) : reader.termDocs(); > > Is this cor

Re: 2.9.1

2009-10-17 Thread John Wang
In DirectoryReader$MultiTermDocs implementation:in method: protected TermDocs termDocs(IndexReader reader) return term==null ? reader.termDocs(null) : reader.termDocs(); Is this correct? Shouldn't it be: return term==null ? reader.termDocs() : reader.termDocs(term); Thanks -John On Sat, Oc

Re: lucene 2.9 sorting algorithm

2009-10-16 Thread John Wang
anks John; I'll have a look. > > Mike > > On Fri, Oct 16, 2009 at 12:57 AM, John Wang wrote: > > Hi Michael: > > I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector > as > > a more general case. I think keeping the old api for ScoreDocCompa

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Hi Michael: I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as a more general case. I think keeping the old api for ScoreDocComparator and SortComparatorSource would work. Please take a look. Thanks -John On Thu, Oct 15, 2009 at 6:52 PM, John Wang wrote: >

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
t; Mike > > On Thu, Oct 15, 2009 at 6:33 PM, John Wang wrote: > > BTW, we are have a little sandbox for these experiments. And all my > testcode > > are at. They are not very polished. > > > > https://lucene-book.googlecode.com/svn/trunk > > > > -John

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
BTW, we are have a little sandbox for these experiments. And all my testcode are at. They are not very polished. https://lucene-book.googlecode.com/svn/trunk -John On Thu, Oct 15, 2009 at 3:29 PM, John Wang wrote: > Numbers Mike requested for Int types: > > only the time/cputime a

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Numbers Mike requested for Int types: only the time/cputime are posted, others are all the same since the algorithm is the same. Lucene 2.9: numhits: 10 time: 14619495 cpu: 146126 numhits: 20 time: 14550568 cpu: 163242 numhits: 100 time: 16467647 cpu: 178379 my test: numHits: 10 time: 1410109

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
hu, Oct 15, 2009 at 2:12 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Nice results! Comments below... > > On Thu, Oct 15, 2009 at 3:58 PM, John Wang wrote: > > Hi guys: > > > > I did some Big O math a few times and reached the same conclusion &g

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Hi guys: I did some Big O math a few times and reached the same conclusion Jake had. I was not sure about the code tuning opportunities we could have done with the MergeAtTheEnd method as Yonik mentioned and the internal behavior with PQ Mike suggested, so I went ahead and implemented the

lucene 2.9 sorting algorithm

2009-10-14 Thread John Wang
Hi guys: Looking at the 2.9 sorting algorithm, and while trying to understand FieldComparator class, I was wondering about the following optimization: (I am using StringOrdValComparator as an example) Currently we have 1 instance of per segment data structure, e.g. (ords,vals etc.), and we kee

[jira] Updated: (LUCENE-1969) adding kamikaze to lucene contrib

2009-10-13 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-1969: -- Attachment: kamikaze.contrib.patch2 again it was the package name. redid local run and all tests pass

[jira] Updated: (LUCENE-1969) adding kamikaze to lucene contrib

2009-10-13 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-1969: -- Attachment: build.xml updated build.xml with package name changes. > adding kamikaze to luc

[jira] Commented: (LUCENE-1969) adding kamikaze to lucene contrib

2009-10-13 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765306#action_12765306 ] John Wang commented on LUCENE-1969: --- My bad! The build.xml is not updated with

new sorting api and some perf numbers

2009-10-11 Thread John Wang
Hi guys: The new FieldComparator api looks really scary :) But after some perf testing with numbers I'd like to share, I guess it is worth it: HW: Mac Pro with 16G memory jvm: 1.6.0_13" jvm arg: -Xms1g -Xmx1g -server setup index: 1M docs even split into 8 segments (to make sure the test

[jira] Updated: (LUCENE-1969) adding kamikaze to lucene contrib

2009-10-10 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-1969: -- Attachment: kamikaze-contrib.patch kamikaze contrib > adding kamikaze to lucene cont

[jira] Created: (LUCENE-1969) adding kamikaze to lucene contrib

2009-10-10 Thread John Wang (JIRA)
: 2.9 Reporter: John Wang Adding kamikaze to lucene contrib -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread John Wang
open an issue, > etc. > > > > Probably because it's a large amount of code (I think?) you'll need to > > submit a software grant > > (http://www.apache.org/licenses/software-grant.txt). > > > > Mike > > > > On Thu, Oct 8, 2009 at 2:58 PM, John Wa

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread John Wang
Awesome! Mike, can you let us know what the process is and the time line? Thanks -John On Thu, Oct 8, 2009 at 11:48 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > +1! > > Mike > > On Thu, Oct 8, 2009 at 2:41 PM, John Wang wrote: > > Hi guys: > >

Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-08 Thread John Wang
Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: John Wang (JIRA) [mailto:j...@apache.org] > > Sent: Thursday, September 24, 2009 3:14 PM > > To: java-dev@lucene

[jira] Commented: (LUCENE-1458) Further steps towards flexible indexing

2009-10-05 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762224#action_12762224 ] John Wang commented on LUCENE-1458: --- Hi Yonik: These are indeed useful feat

  1   2   3   >