[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-09 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662632#action_12662632 ] John Wang commented on LUCENE-1345: --- Given the perf number improvements we see, can we c

[jira] Updated: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-09 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-1345: -- Attachment: booleansetperf.txt Added And/Or/Not DocidSet/Iterators code ported over from Kamikaze: ht

[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-09 Thread John Wang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662627#action_12662627 ] John Wang commented on LUCENE-1345: --- Added perf comparisons with boolean set iterators w

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-01-09 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662603#action_12662603 ] Jason Rutherglen commented on LUCENE-1516: -- It looks like DirectoryIndexReader ne

Re: Filesystem based bitset

2009-01-09 Thread robert engels
It was not ad hominem. It was a indirect critique of the value of the answer provided. Ad hominem would be if I called him ugly. On Jan 9, 2009, at 6:34 PM, Doug Cutting wrote: robert engels wrote: Can something be offensive if its a statement of fact ? If you believe it is (under definit

Re: Filesystem based bitset

2009-01-09 Thread robert engels
Your exactly right. Playing well with others has trumped actual production and quality. You can see the mess that's gotten us in all sorts of areas. Luckily there are entrepreneurs and other managers/owners that value quality first, and let feelings get repaired over beers or not at all.

[jira] Created: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-01-09 Thread Jason Rutherglen (JIRA)
Integrate IndexReader with IndexWriter --- Key: LUCENE-1516 URL: https://issues.apache.org/jira/browse/LUCENE-1516 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.4 Repo

[jira] Commented: (LUCENE-627) highlighter problems with overlapping tokens

2009-01-09 Thread Chris Harris (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662579#action_12662579 ] Chris Harris commented on LUCENE-627: - I'm here two years after the last comment, tryin

Re: Filesystem based bitset

2009-01-09 Thread Doug Cutting
robert engels wrote: Can something be offensive if its a statement of fact ? If you believe it is (under definition #3), then his remarks to me were just as offensive - as they caused me much displeasure and resentment. So please dress him down as well. His comments were on-topic. The topic

Re: Filesystem based bitset

2009-01-09 Thread Ian Holsman
Robert. * no one is forcing you to be on this mailing list. * next time you look for a job, and your prospective employer 'googles' you, they are going to find this anti-social behavior. "playing well with others" is usually a key employment criteria people look for. (as well as being super-s

Re: Filesystem based bitset

2009-01-09 Thread robert engels
Can something be offensive if its a statement of fact ?  If you believe it is (under definition #3), then his remarks to me were just as offensive - as they caused me much displeasure and resentment. So please dress him down as well.Main Entry: 1of·fen·sive  Pronunciation: \ə-ˈfen(t)-siv, especial

Re: Filesystem based bitset

2009-01-09 Thread Doug Cutting
robert engels wrote: You are a moron. And I don't mean that in a offensive way - I am using the secondary definition. *2**:* a very stupid person That's still offensive and totally unacceptable here. Please refrain from making ad-hominem remarks and stick to discussing the issues. Thanks

Re: Realtime Search

2009-01-09 Thread Jason Rutherglen
I think the IW integrated IR needs a rule regarding the behavior of IW.flush and IR.flush. There will need to be a flush lock that is shared between the IW and IR. The lock is acquired at the beginning of a flush and released immediately after a successful or unsuccessful call. We will need to shar

Re: Filesystem based bitset

2009-01-09 Thread robert engels
I have better things to do than read a 10,000 word incident that discusses about 100 different topics under the generic heading "Further steps towards flexible indexing" in order to answer a simple question.You are a moron.  And I don't mean that in a offensive way - I am using the secondary defin

Re: Filesystem based bitset

2009-01-09 Thread Marvin Humphrey
On Fri, Jan 09, 2009 at 03:42:35PM -0600, robert engels wrote: > If your index can fit in the IO cache, you should using a completely > different implementation... > > You should be writing a sequential transaction log for add/update/ > delete operations, and storing the entire index in memory

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-09 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662545#action_12662545 ] Jason Rutherglen commented on LUCENE-1314: -- *Software* System Version: Ma

Re: Filesystem based bitset

2009-01-09 Thread robert engels
If your index can fit in the IO cache, you should using a completely different implementation... You should be writing a sequential transaction log for add/update/ delete operations, and storing the entire index in memory (RAMDirectory) - with periodic background flushes of the log. If you

Re: Filesystem based bitset

2009-01-09 Thread Marvin Humphrey
On Fri, Jan 09, 2009 at 08:11:31PM +0100, Karl Wettin wrote: > SSD is pretty close to RAM when it comes to seeking. Wouldn't that > mean that a bitset stored on an SSD would be more or less as fast as a > bitset in RAM? Provided that your index can fit in the system i/o cache and stay there,

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread robert engels
You only write one version - the one with logging statements. They will be removed at RUNTIME - given the proper class loader. The Frog library I referenced allows a degree of logging without writing any logging code - it is injected at runtime. On Jan 9, 2009, at 2:31 PM, Shalin Shekhar Man

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662519#action_12662519 ] Michael McCandless commented on LUCENE-1314: Odd, I still can't see it. Are y

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread Yonik Seeley
On Fri, Jan 9, 2009 at 3:31 PM, Shalin Shekhar Mangar wrote: > If we forget the bytecode modification for a moment, how much cost does this > add to Lucene when used by a real application with slf4j logging? (e.g. Solr > uses the jdk adapter and no-op adapter cannot be used) AFAIK, the infostream

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread Shalin Shekhar Mangar
On Sat, Jan 10, 2009 at 12:41 AM, robert engels wrote: > This is not really true these days. Dynamic class instrumentation/byte > modification can remove the calls entirely (for loggers not enabled). They > can be enabled during startup (or a reload from a different class loader). > > See the pa

Re: Realtime Search

2009-01-09 Thread Jason Rutherglen
> "But I think for realtime we don't want to be using IW's deletion at all. We should do all deletes via the IndexReader. In fact if IW has handed out a reader (via getReader()) and that reader (or a reopened derivative) remains open we may have to block deletions via IW. Not sure..." Can't IW

Re: Realtime Search

2009-01-09 Thread Grant Ingersoll
I realize we aren't adding read functionality to the Writer, but it would be coupling the Writer to the Reader nonetheless. I understand it is brainstorming (like I said, not trying to distract from the discussion), just saying that if the Reader and the Writer both need access to the unde

Re: Realtime Search

2009-01-09 Thread Michael McCandless
Grant Ingersoll wrote: We've spent a lot of time up until now getting write functionality out of the Reader, and now we are going to add read functionality into the Writer? Well... we're not really adding read functionality into IW; instead, we are asking IW to open the reader for us, exce

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread robert engels
Also, see this http://venkatesans.com for an implementation (Frog) which injects logging at runtime. This is not really what I propose though. I think it is better to code the logging statements, and have them removed at runtime. It allows for more context sensitive logging statements. No

Re: Filesystem based bitset

2009-01-09 Thread Michael McCandless
While SSDs are delightfully fast compared to mechanical drives, I think they are still quite a bit slower than RAM for truly random access. EG Intel's X25-E (apparently the leader at the moment) lists a 75us read latency, whereas RAM latency is maybe 50-100 ns. Though since Lucene accesses the

Re: Realtime Search

2009-01-09 Thread Grant Ingersoll
On Jan 9, 2009, at 8:39 AM, Michael McCandless wrote: Jason Rutherglen wrote: Patch #1: Expose an IndexWriter.getReader method that returns the current reader and shares the write lock I tentatively like this approach so far... That reader is opened using IndexWriter's SegmentInfos insta

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-09 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662485#action_12662485 ] Jason Rutherglen commented on LUCENE-1314: -- That worked Erik. I executed TestInd

Re: Realtime Search

2009-01-09 Thread Michael McCandless
Jason Rutherglen wrote: > Are you referring to the IW.pendingCommit SegmentInfos variable? No, I'm referring to segmentInfos. (pendingCommit is the "snapshot" of segmentInfos taken when committing...). > When you say "flushed" you are referring to the IW.prepareCommit method? No, I'm referrin

Filesystem based bitset

2009-01-09 Thread Karl Wettin
Thinking out loud, SSD is pretty close to RAM when it comes to seeking. Wouldn't that mean that a bitset stored on an SSD would be more or less as fast as a bitset in RAM? So how about storing all permutations of filters one use on SSD? Perhaps loading them to RAM in case they are frequentl

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread robert engels
This is not really true these days. Dynamic class instrumentation/ byte modification can remove the calls entirely (for loggers not enabled). They can be enabled during startup (or a reload from a different class loader). See the paper at http://www.springerlink.com/content/ur00014m0327542

[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662476#action_12662476 ] Yonik Seeley commented on LUCENE-1482: -- I'm not arguing for or against SLF4J at this

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-09 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662475#action_12662475 ] Erik Hatcher commented on LUCENE-1314: -- {quote} Is there a way with ant to only test

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-09 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662472#action_12662472 ] Jason Rutherglen commented on LUCENE-1314: -- I haven't seen this error via the com

[jira] Updated: (LUCENE-1515) Improved(?) Swedish snowball stemmer

2009-01-09 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin updated LUCENE-1515: Attachment: LUCENE-1515.txt snowball code, generated java class and unit test. > Improved(?) Swed

[jira] Commented: (LUCENE-1039) Bayesian classifiers using Lucene as data store

2009-01-09 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662467#action_12662467 ] Karl Wettin commented on LUCENE-1039: - What do you people think, should I commit this

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662465#action_12662465 ] Michael McCandless commented on LUCENE-1314: {quote} > Occasionally TestIndexR

[jira] Commented: (LUCENE-1479) TrecDocMaker skips over documents when "Date" is missing from documents

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662464#action_12662464 ] Michael McCandless commented on LUCENE-1479: {quote} > I'll feed it with some

[jira] Created: (LUCENE-1515) Improved(?) Swedish snowball stemmer

2009-01-09 Thread Karl Wettin (JIRA)
Improved(?) Swedish snowball stemmer Key: LUCENE-1515 URL: https://issues.apache.org/jira/browse/LUCENE-1515 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Affects Vers

[jira] Commented: (LUCENE-1479) TrecDocMaker skips over documents when "Date" is missing from documents

2009-01-09 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662452#action_12662452 ] Shai Erera commented on LUCENE-1479: The reason why this patch does not include a test

[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662448#action_12662448 ] Shai Erera commented on LUCENE-1482: Like I wrote before, I believe that if someone wi

Re: Realtime Search

2009-01-09 Thread Jason Rutherglen
M.M.: "That reader is opened using IndexWriter's SegmentInfos instance, so it can read segments & deletions that have been flushed but not committed. It's allowed to do its own deletions & norms updating. When reopen() is called, it grabs the writers SegmentInfos again." Are you referring to the

[jira] Closed: (LUCENE-1514) ShingleMatrixFilter eaily throws StackOverFlow as the complexity of a matrix grows

2009-01-09 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin closed LUCENE-1514. --- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Commit

[jira] Updated: (LUCENE-1514) ShingleMatrixFilter eaily throws StackOverFlow as the complexity of a matrix grows

2009-01-09 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin updated LUCENE-1514: Attachment: LUCENE-1514.txt > ShingleMatrixFilter eaily throws StackOverFlow as the complexity of

[jira] Created: (LUCENE-1514) ShingleMatrixFilter eaily throws StackOverFlow as the complexity of a matrix grows

2009-01-09 Thread Karl Wettin (JIRA)
ShingleMatrixFilter eaily throws StackOverFlow as the complexity of a matrix grows -- Key: LUCENE-1514 URL: https://issues.apache.org/jira/browse/LUCENE-1514 Project: Luc

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-09 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662380#action_12662380 ] Mark Miller commented on LUCENE-1483: - Still coming. Heavily side tracked for a bit wi

Re: Realtime Search

2009-01-09 Thread Michael McCandless
Marvin Humphrey wrote: > The goal is to improve worst-case write performance. > ... > In between the time when the background merge writer starts up and the time > it finishes consolidating segment data, we assume that the primary writer > will have modified the index. > > * New docs have bee

Re: Realtime Search

2009-01-09 Thread Michael McCandless
Jason Rutherglen wrote: Patch #1: Expose an IndexWriter.getReader method that returns the current reader and shares the write lock I tentatively like this approach so far... That reader is opened using IndexWriter's SegmentInfos instance, so it can read segments & deletions that have been f

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662362#action_12662362 ] Michael McCandless commented on LUCENE-1476: {quote} > One way to think of th

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662347#action_12662347 ] Michael McCandless commented on LUCENE-1476: {quote} > Under the current syst

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662345#action_12662345 ] Michael McCandless commented on LUCENE-1476: {quote} > We could potentially ma

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662342#action_12662342 ] Michael McCandless commented on LUCENE-1476: {quote} > We can hide the sparse

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662339#action_12662339 ] Michael McCandless commented on LUCENE-1476: {quote} > Mmm. I think I might ha

[jira] Commented: (LUCENE-1479) TrecDocMaker skips over documents when "Date" is missing from documents

2009-01-09 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662338#action_12662338 ] Michael McCandless commented on LUCENE-1479: Ahh the last minute "trivial" cod