Solr-trunk - Build # 1389 - Failure

2011-01-25 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Solr-trunk/1389/

All tests passed

Build Log (for compile errors):
[...truncated 18778 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986281#action_12986281
 ] 

Simon Willnauer commented on LUCENE-2868:
-

bq.Here's my take on the patch, including ability to cache weight objects.
I have a couple of comments here - first I can not apply your patch to the 
current trunk can you update it?

* you keep a cache per IndexSeacher (btw. QueryDataCache is missing in the 
patch) which is used to cache several things across searches. This is very 
dangerous! While I don't know how it is implemented I would guess you need to 
synchronized access to it so it would slow down searches ey? 

* Caching Scorers is going to break since Scorers are stateful and might be 
advanced to different documents. Yet, I can see what you are trying to do here 
since doing work in a scorer is costly so common TermQueries for instance 
should not need to load the same posting list twice. There are two things which 
come to my mind right away. 1. Postinglist caching - should be done on a codec 
level IMO 2. Building PerReaderTermState only once for a common TermQuery. 
While caching PostingLists is going to be tricky and quite a task reusing 
PerReaderTermState could work fine as far as I can see if you are in the same 
searcher. 

* Caching Weights is kind of weird - what is the reason for this again? The 
only thing you really save here is setup costs which are generally very low.

Overall I don' t like that this way you tightly couple  something to Weight / 
Query etc. for a single purpose what could be solved with some kind of query 
optimization phase similar to what I had in my last patch and Earwin has 
proposed. I think we should not tight couple things like that into lucene. This 
is really extremely application dependent in the most cases and we should only 
provide the infrastructure to do it. 

bq. Earwin - I think we should make a new issue and get something like that 
implemented in there which is more general than what I just sketched out. If 
you could share your code that would be awesome!
Earwin, any new on this - shall I open an issue for that?

bq. It occurs to me that the name of the common class that gets created in 
IndexSearcher and passed around should probably be named something more 
appropriate, like QueryContext. That way people will feel free to extend it to 
hold all sorts of query-local data, in time. Thoughts?
You refer to ScorerContext? This class was actually not intended to be 
expendable its public final until now. I am not sure if we should open that up 
though. 

 It should be easy to make use of TermState; rewritten queries should be 
 shared automatically
 

 Key: LUCENE-2868
 URL: https://issues.apache.org/jira/browse/LUCENE-2868
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Karl Wright
 Attachments: lucene-2868.patch, query-rewriter.patch


 When you have the same query in a query hierarchy multiple times, tremendous 
 savings can now be had if the user knows enough to share the rewritten 
 queries in the hierarchy, due to the TermState addition.  But this is clumsy 
 and requires a lot of coding by the user to take advantage of.  Lucene should 
 be smart enough to share the rewritten queries automatically.
 This can be most readily (and powerfully) done by introducing a new method to 
 Query.java:
 Query rewriteUsingCache(IndexReader indexReader)
 ... and including a caching implementation right in Query.java which would 
 then work for all.  Of course, all callers would want to use this new method 
 rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Solr-trunk - Build # 1389 - Failure

2011-01-25 Thread Uwe Schindler
F**ck! I am posting a comment, the stack trace looks different!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Apache Hudson Server [mailto:hud...@hudson.apache.org]
 Sent: Tuesday, January 25, 2011 9:53 AM
 To: dev@lucene.apache.org
 Subject: Solr-trunk - Build # 1389 - Failure
 
 Build: https://hudson.apache.org/hudson/job/Solr-trunk/1389/
 
 All tests passed
 
 Build Log (for compile errors):
 [...truncated 18778 lines...]
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2011-01-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986346#action_12986346
 ] 

Michael McCandless commented on LUCENE-2010:


bq. Do you want to fix the rest of the tests and remove the text-only 
keepAllSegments method?

It's actually only the QueryUtils test class that uses this... it makes an 
empty index by adding N docs and then deleting them all.  So the test-only 
API needs to be public (QueryUtils is in oal.search).  I'll mark it as 
lucene.internal...

 Remove segments with all documents deleted in commit/flush/close of 
 IndexWriter instead of waiting until a merge occurs.
 

 Key: LUCENE-2010
 URL: https://issues.apache.org/jira/browse/LUCENE-2010
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2010.patch


 I do not know if this is a bug in 2.9.0, but it seems that segments with all 
 documents deleted are not automatically removed:
 {noformat}
 4 of 14: name=_dlo docCount=5
   compound=true
   hasProx=true
   numFiles=2
   size (MB)=0.059
   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
 2009-09-21 10:25:09, os=SunOS,
  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
 source=flush}
   has deletions [delFileName=_dlo_1.del]
   test: open reader.OK [5 deleted docs]
   test: fields..OK [136 fields]
   test: field norms.OK [136 fields]
   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
   test: stored fields...OK [0 total field count; avg ? fields per doc]
   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
 fields per doc]
 {noformat}
 Shouldn't such segments not be removed automatically during the next 
 commit/close of IndexWriter?
 *Mike McCandless:*
 Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
 given segment has been deleted, it will still merge it [away] like normal, 
 rather than simply dropping it immediately from the index, which I agree 
 would be a simple optimization. Can you open a new issue? I would think IW 
 can drop such a segment immediately (ie not wait for a merge or optimize) on 
 flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2868) It should be easy to make use of TermState; rewritten queries should be shared automatically

2011-01-25 Thread Karl Wright (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated LUCENE-2868:


Attachment: lucene-2868.patch

Oops, forgot to add a key file.
Seriously, the weight caching is of minor utility.  The scorer caching is not 
enabled.  So all that this patch does differently is try to define a broader 
concept of query context, rather than the narrow fix Simon proposes.

 It should be easy to make use of TermState; rewritten queries should be 
 shared automatically
 

 Key: LUCENE-2868
 URL: https://issues.apache.org/jira/browse/LUCENE-2868
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Karl Wright
 Attachments: lucene-2868.patch, lucene-2868.patch, 
 query-rewriter.patch


 When you have the same query in a query hierarchy multiple times, tremendous 
 savings can now be had if the user knows enough to share the rewritten 
 queries in the hierarchy, due to the TermState addition.  But this is clumsy 
 and requires a lot of coding by the user to take advantage of.  Lucene should 
 be smart enough to share the rewritten queries automatically.
 This can be most readily (and powerfully) done by introducing a new method to 
 Query.java:
 Query rewriteUsingCache(IndexReader indexReader)
 ... and including a caching implementation right in Query.java which would 
 then work for all.  Of course, all callers would want to use this new method 
 rather than the current rewrite().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] Created: (LUCENE-2886) Adaptive Frame Of Reference

2011-01-25 Thread Renaud Delbru

Hi Paul,

This is a good question. The two methods, i.e., VSE and AFOR, are very 
similar. The two methods can be considered as an extension of FOR to 
make it less sensitive to outliers by adapting the encoding to the value 
distribution. To achieve this, the two methods are encoding a list of 
values by
- partitioning it into frames (or sequence of consecutive integers) of 
variable lengths,
- encoding each frame using a different bit frame (the minimum number 
of bits required to encode any integer in the frame, and still be able 
to distinguish them)

- relying on algorithms to automatically find a good list partitioning.

Apart from the minor differences in the implementation design (that I 
will discuss later), the main difference is that VSE is optimised for 
achieving a high compression rate and a fast decompression but 
disregards the efficiency of compression, while AFOR is optimised for 
achieving a high compression rate, a fast decompression but also a fast 
compression speed. VSE is using a Dynamic Programming method to find the 
*optimal partitioning* of a list (optimal in term of compression rate). 
While this approach provides a higher compression rate than the one 
proposed in AFOR, the complexity of such a partitioning algorithm is O(n 
* k), with the term n being the number of values and the term k the size 
of the larger frame, which might greatly impact the compression 
performance. In AFOR, we use instead a local optimisation algorithm that 
is less effective in term of compression rate but faster to compute.


In term of implementation details, there is a few differences.
1) VSE allows frames of length 1, 2, 4, 6, 8, 12, 16 and 32. The current 
implementation of AFOR restrict the length of a frame to be a multiple 
of 8 to to be aligned with the start and end of a byte boundary (and 
also to minimise the number of loop-unrolled highly-optimised routines). 
More precisely, AFOR-2 use three frame lengths: 8, 16 and 32.
2) To allow the *optimal partitioning* of a list, the original 
implementation of VSE needs to operate on the full list. On the 
contrary, AFOR has been developed to operate on small subsets of the 
list, so that AFOR can be applied during incremental construction of the 
compressed list (it does not require the full list, but works on small 
block of 32 or more integers). However, we can think of applying VSE on 
small subset, as in AFOR. In this case, VSE does not compute the optimal 
partition of a list, but only the optimal partition of the subset of the 
list.


VSE and AFOR encodes a frame in a similar way: first, a header (1 byte) 
which provides the bit frame and the frames length, then the encoded frame.


So, as you can see, in essence, the two models are very similar. For the 
background, I know well Fabrizio Silvestri (co-author of VSE), and he 
was my PhD thesis examiner (the AFOR compression scheme is a chapter of 
my thesis). The funny thing is that we come up with these two models at 
the same time, this summer, without knowing we were working on something 
similar ;o). However, he was more lucky than I am to publish his 
findings before me.


I hope this answers to your question.
Feel free to ask if you have any other questions,
Regards,
--
Renaud Delbru

On 24/01/11 22:02, Paul Elschot wrote:

Any idea on how this compares to the vector split encoding here:
http://puma.isti.cnr.it/publichtml/section_cnr_isti/cnr_isti_2010-TR-016.html
?

Regards,
Paul Elschot

On Monday 24 January 2011 19:32:44 Renaud Delbru (JIRA) wrote:

Adaptive Frame Of Reference


  Key: LUCENE-2886
  URL: https://issues.apache.org/jira/browse/LUCENE-2886
  Project: Lucene - Java
   Issue Type: New Feature
   Components: Codecs
 Reporter: Renaud Delbru
  Fix For: 4.0


We could test the implementation of the Adaptive Frame Of Reference [1] on the 
lucene-4.0 branch.
I am providing the source code of its implementation. Some work needs to be 
done, as this implementation is working on the old lucene-1458 branch.
I will attach a tarball containing a running version (with tests) of the AFOR 
implementation, as well as the implementations of PFOR and of Simple64 (simple 
family codec working on 64bits word) that has been used in the experiments in 
[1].

[1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





Re: [jira] Created: (LUCENE-2886) Adaptive Frame Of Reference

2011-01-25 Thread Renaud Delbru
-- sorry, resending it as I don't know what happens to the layout of the 
previous one


Hi Paul,

This is a good question. The two methods, i.e., VSE and AFOR, are very 
similar. The two methods can be considered as an extension of FOR to 
make it less sensitive to outliers by adapting the encoding to the value 
distribution. To achieve this, the two methods are encoding a list of 
values by
- partitioning it into frames (or sequence of consecutive integers) of 
variable lengths,
- encoding each frame using a different bit frame (the minimum number 
of bits required to encode any integer in the frame, and still be able 
to distinguish them)

- relying on algorithms to automatically find a good list partitioning.

Apart from the minor differences in the implementation design (that I 
will discuss later), the main difference is that VSE is optimised for 
achieving a high compression rate and a fast decompression but 
disregards the efficiency of compression, while AFOR is optimised for 
achieving a high compression rate, a fast decompression but also a fast 
compression speed. VSE is using a Dynamic Programming method to find the 
*optimal partitioning* of a list (optimal in term of compression rate). 
While this approach provides a higher compression rate than the one 
proposed in AFOR, the complexity of such a partitioning algorithm is O(n 
* k), with the term n being the number of values and the term k the size 
of the larger frame, which might greatly impact the compression 
performance. In AFOR, we use instead a local optimisation algorithm that 
is less effective in term of compression rate but faster to compute.


In term of implementation details, there is a few differences.
1) VSE allows frames of length 1, 2, 4, 6, 8, 12, 16 and 32. The current 
implementation of AFOR restrict the length of a frame to be a multiple 
of 8 to to be aligned with the start and end of a byte boundary (and 
also to minimise the number of loop-unrolled highly-optimised routines). 
More precisely, AFOR-2 use three frame lengths: 8, 16 and 32.
2) To allow the *optimal partitioning* of a list, the original 
implementation of VSE needs to operate on the full list. On the 
contrary, AFOR has been developed to operate on small subsets of the 
list, so that AFOR can be applied during incremental construction of the 
compressed list (it does not require the full list, but works on small 
block of 32 or more integers). However, we can think of applying VSE on 
small subset, as in AFOR. In this case, VSE does not compute the optimal 
partition of a list, but only the optimal partition of the subset of the 
list.


VSE and AFOR encodes a frame in a similar way: first, a header (1 byte) 
which provides the bit frame and the frames length, then the encoded frame.


So, as you can see, in essence, the two models are very similar. For the 
background, I know well Fabrizio Silvestri (co-author of VSE), and he 
was my PhD thesis examiner (the AFOR compression scheme is a chapter of 
my thesis). The funny thing is that we come up with these two models at 
the same time, this summer, without knowing we were working on something 
similar ;o). However, he was more lucky than I am to publish his 
findings before me.


I hope this answers to your question.
Feel free to ask if you have any other questions,
Regards,
--
Renaud Delbru

On 25/01/11 12:24, Renaud Delbru wrote:

Hi Paul,

This is a good question. The two methods, i.e., VSE and AFOR, are very 
similar. The two methods can be considered as an extension of FOR to 
make it less sensitive to outliers by adapting the encoding to the 
value distribution. To achieve this, the two methods are encoding a 
list of values by
- partitioning it into frames (or sequence of consecutive integers) 
of variable lengths,
- encoding each frame using a different bit frame (the minimum 
number of bits required to encode any integer in the frame, and still 
be able to distinguish them)

- relying on algorithms to automatically find a good list partitioning.

Apart from the minor differences in the implementation design (that I 
will discuss later), the main difference is that VSE is optimised for 
achieving a high compression rate and a fast decompression but 
disregards the efficiency of compression, while AFOR is optimised for 
achieving a high compression rate, a fast decompression but also a 
fast compression speed. VSE is using a Dynamic Programming method to 
find the *optimal partitioning* of a list (optimal in term of 
compression rate). While this approach provides a higher compression 
rate than the one proposed in AFOR, the complexity of such a 
partitioning algorithm is O(n * k), with the term n being the number 
of values and the term k the size of the larger frame, which might 
greatly impact the compression performance. In AFOR, we use instead a 
local optimisation algorithm that is less effective in term of 
compression rate but faster to compute.


In term of implementation details, 

[jira] Commented: (LUCENE-2887) Remove/deprecate IndexReader.undeleteAll

2011-01-25 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986364#action_12986364
 ] 

Doron Cohen commented on LUCENE-2887:
-

I think it is correct to say that if the result of ir.numDeletedDocs() is N, 
then calling ir.undeleteAll() will delete exactly N documents... or am I 
missing it? 

Because if a merge was invoked for the segments seen by this reader, I see two 
options:
# A merge is on going, or the merge is done but uncommitted yet.
   This means that an index writer has a lock on the index, hence 
ir.undeleteAll() will fail to get the lock.
# The a merge was already committed.
   This means that the index reader will fail to get write permission for being 
Stale.

So I think this method behaves deterministically - perhaps its jdoc should say 
something like: 
*Undeletes all #numDeletedDocs() documents currently marked as deleted in this 
index.* ?

 Remove/deprecate IndexReader.undeleteAll
 

 Key: LUCENE-2887
 URL: https://issues.apache.org/jira/browse/LUCENE-2887
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1


 This API is rather dangerous in that it's best effort since it can only 
 un-delete docs that have not yet been merged away, or, dropped (as of 
 LUCENE-2010).
 Given that it exposes impl details of how Lucene prunes deleted docs, I think 
 we should remove this API.
 Are there legitimate use cases?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2887) Remove/deprecate IndexReader.undeleteAll

2011-01-25 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986364#action_12986364
 ] 

Doron Cohen edited comment on LUCENE-2887 at 1/25/11 7:49 AM:
--

I think it is correct to say that if the result of ir.numDeletedDocs() is *N*, 
then calling ir.undeleteAll() will undelete exactly *N* documents... or am I 
missing it? 

Because if a merge was invoked for the segments seen by this reader, I see two 
options:
# A merge is on going, or the merge is done but uncommitted yet.
   This means that an index writer has a lock on the index, hence 
ir.undeleteAll() will fail to get the lock.
# The merge was already committed.
   This means that the index reader will fail to get write permission for being 
Stale.

So I think this method behaves deterministically - perhaps its jdoc should say 
something like: 
*Undeletes all #numDeletedDocs() documents currently marked as deleted in this 
index.* ?

  was (Author: doronc):
I think it is correct to say that if the result of ir.numDeletedDocs() is 
N, then calling ir.undeleteAll() will delete exactly N documents... or am I 
missing it? 

Because if a merge was invoked for the segments seen by this reader, I see two 
options:
# A merge is on going, or the merge is done but uncommitted yet.
   This means that an index writer has a lock on the index, hence 
ir.undeleteAll() will fail to get the lock.
# The a merge was already committed.
   This means that the index reader will fail to get write permission for being 
Stale.

So I think this method behaves deterministically - perhaps its jdoc should say 
something like: 
*Undeletes all #numDeletedDocs() documents currently marked as deleted in this 
index.* ?
  
 Remove/deprecate IndexReader.undeleteAll
 

 Key: LUCENE-2887
 URL: https://issues.apache.org/jira/browse/LUCENE-2887
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1


 This API is rather dangerous in that it's best effort since it can only 
 un-delete docs that have not yet been merged away, or, dropped (as of 
 LUCENE-2010).
 Given that it exposes impl details of how Lucene prunes deleted docs, I think 
 we should remove this API.
 Are there legitimate use cases?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-942) TopDocCollector.topDocs throws ArrayIndexOutOfBoundsException when called twice

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-942.
-

Resolution: Not A Problem

TopDocsCollector documents that you cannot call topDocs() more than once for 
each search execution.

 TopDocCollector.topDocs throws ArrayIndexOutOfBoundsException when called 
 twice
 ---

 Key: LUCENE-942
 URL: https://issues.apache.org/jira/browse/LUCENE-942
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
Reporter: Aaron Isotton
Priority: Minor

 Here's the implementation of TopDocCollector.topDocs():
   public TopDocs topDocs() {
 ScoreDoc[] scoreDocs = new ScoreDoc[hq.size()];
 for (int i = hq.size()-1; i = 0; i--)  // put docs in array
   scoreDocs[i] = (ScoreDoc)hq.pop();
   
 float maxScore = (totalHits==0)
   ? Float.NEGATIVE_INFINITY
   : scoreDocs[0].score;
 
 return new TopDocs(totalHits, scoreDocs, maxScore);
   }
 When you call topDocs(), hq gets emptied. Thus the second time you call it 
 scoreDocs.length will be 0 and scoreDocs[0] will throw an 
 ArrayIndexOutOfBoundsException.
 I don't know whether this 'call only once' semantics is intended behavior or 
 not; if not, it should be fixed, if yes it should be documented.
 Thanks a lot for an absolutely fantastic product,
 Aaron

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-423) thread pool implementation of parallel queries

2011-01-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir closed LUCENE-423.
--

   Resolution: Fixed
Fix Version/s: 3.1
 Assignee: (was: Lucene Developers)

You can provide an ExecutorService now, so I think this one is resolved.


 thread pool implementation of parallel queries
 --

 Key: LUCENE-423
 URL: https://issues.apache.org/jira/browse/LUCENE-423
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 1.4
 Environment: Operating System: other
 Platform: Other
Reporter: Randy Puttick
Priority: Minor
 Fix For: 3.1

 Attachments: ConcurrentMultiSearcher.java


 This component is a replacement for ParallelMultiQuery that runs a thread pool
 with queue instead of starting threads for every query execution (so its
 performance is better).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-522) SpanFuzzyQuery

2011-01-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-522.


Resolution: Duplicate

This is fixed in LUCENE-2754, you can use any multitermquery in spans.


 SpanFuzzyQuery
 --

 Key: LUCENE-522
 URL: https://issues.apache.org/jira/browse/LUCENE-522
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 1.9
Reporter: Karl Wettin
Priority: Minor

 This is my SpanFuzzyQuery. It is released under the Apache licensence. Just 
 paste it in.
 package se.snigel.lucene;
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.Term;
 import org.apache.lucene.search.*;
 import org.apache.lucene.search.spans.SpanOrQuery;
 import org.apache.lucene.search.spans.SpanQuery;
 import org.apache.lucene.search.spans.SpanTermQuery;
 import org.apache.lucene.search.spans.Spans;
 import java.io.IOException;
 import java.util.Collection;
 import java.util.LinkedList;
 /**
  * @author Karl Wettin ka...@snigel.net
  */
 public class SpanFuzzyQuery extends SpanQuery {
 public final static float defaultMinSimilarity = 0.7f;
 public final static int defaultPrefixLength = 0;
 private final Term term;
 private final float minimumSimilarity;
 private final int prefixLength;
 private BooleanQuery rewrittenFuzzyQuery;
 public SpanFuzzyQuery(Term term) {
 this(term, defaultMinSimilarity, defaultPrefixLength);
 }
 public SpanFuzzyQuery(Term term, float minimumSimilarity, int 
 prefixLength) {
 this.term = term;
 this.minimumSimilarity = minimumSimilarity;
 this.prefixLength = prefixLength;
 if (minimumSimilarity = 1.0f) {
 throw new IllegalArgumentException(minimumSimilarity = 1);
 } else if (minimumSimilarity  0.0f) {
 throw new IllegalArgumentException(minimumSimilarity  0);
 }
 if (prefixLength  0) {
 throw new IllegalArgumentException(prefixLength  0);
 }
 }
 public Query rewrite(IndexReader reader) throws IOException {
 FuzzyQuery fuzzyQuery = new FuzzyQuery(term, minimumSimilarity, 
 prefixLength);
 rewrittenFuzzyQuery = (BooleanQuery) fuzzyQuery.rewrite(reader);
 BooleanClause[] clauses = rewrittenFuzzyQuery.getClauses();
 SpanQuery[] spanQueries = new SpanQuery[clauses.length];
 for (int i = 0; i  clauses.length; i++) {
 BooleanClause clause = clauses[i];
 TermQuery termQuery = (TermQuery) clause.getQuery();
 spanQueries[i] = new SpanTermQuery(termQuery.getTerm());
 spanQueries[i].setBoost(termQuery.getBoost());
 }
 SpanOrQuery query = new SpanOrQuery(spanQueries);
 query.setBoost(fuzzyQuery.getBoost());
 return query;
 }
 /** Expert: Returns the matches for this query in an index.  Used 
 internally
  * to search for spans. */
 public Spans getSpans(IndexReader reader) throws IOException {
 throw new UnsupportedOperationException(Query should have been 
 rewritten);
 }
 /** Returns the name of the field matched by this query.*/
 public String getField() {
 return term.field();
 }
 /** Returns a collection of all terms matched by this query.*/
 public Collection getTerms() {
 if (rewrittenFuzzyQuery == null) {
 throw new RuntimeException(Query must be rewritten prior to 
 calling getTerms()!);
 } else {
 LinkedListTerm terms = new LinkedListTerm();
 BooleanClause[] clauses = rewrittenFuzzyQuery.getClauses();
 for (int i = 0; i  clauses.length; i++) {
 BooleanClause clause = clauses[i];
 TermQuery termQuery = (TermQuery) clause.getQuery();
 terms.add(termQuery.getTerm());
 }
 return terms;
 }
 }
 /** Prints a query to a string, with codefield/code as the default 
 field
  * for terms.  pThe representation used is one that is supposed to be 
 readable
  * by {@link org.apache.lucene.queryParser.QueryParser QueryParser}. 
 However,
  * there are the following limitations:
  * ul
  *  liIf the query was created by the parser, the printed
  *  representation may not be exactly what was parsed. For example,
  *  characters that need to be escaped will be represented without
  *  the required backslash./li
  * liSome of the more complicated queries (e.g. span queries)
  *  don't have a representation that can be parsed by QueryParser./li
  * /ul
  */
 public String toString(String field) {
 return spans( + rewrittenFuzzyQuery.toString() + );
 }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to 

[jira] Resolved: (LUCENE-538) Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause

2011-01-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-538.


   Resolution: Fixed
Fix Version/s: 3.1

This is now fixed by Mike's cleanup to MultiSearcher etc, which fixes this 
combine/rewrite bug

 Using WildcardQuery with MultiSearcher, and Boolean MUST_NOT clause
 ---

 Key: LUCENE-538
 URL: https://issues.apache.org/jira/browse/LUCENE-538
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 1.9
 Environment: Ubuntu Linux, java version 1.5.0_04
Reporter: Helen Warren
Priority: Minor
 Fix For: 3.1

 Attachments: TestMultiSearchWildCard.java


 We are searching across multiple indices using a MultiSearcher. There seems 
 to be a problem when we use a WildcardQuery to exclude documents from the 
 result set. I attach a set of unit tests illustrating the problem.
 In these tests, we have two indices. Each index contains a set of documents 
 with fields for 'title',  'section' and 'index'. The final aim is to do a 
 keyword search, across both indices, on the title field and be able to 
 exclude documents from certain sections (and their subsections) using a
 WildcardQuery on the section field.
  
  e.g. return documents from both indices which have the string 'xyzpqr' in 
 their title but which do not lie
  in the news section or its subsections (section = /news/*).
  
 The first unit test (testExcludeSectionsWildCard) fails trying to do this.
  If we relax any of the constraints made above, tests pass:
  
 * Don't use WildcardQuery, but pass in the news section and it's child 
 section to exclude explicitly (testExcludeSectionsExplicit)/li
 * Exclude results from just one section, not it's children too i.e. don't use 
 WildcardQuery(testExcludeSingleSection)/li
 * Do use WildcardQuery, and exclude a section and its children, but just use 
 one index thereby using the simple
IndexReader and IndexSearcher objects (testExcludeSectionsOneIndex).
 * Try the boolean MUST clause rather than MUST_NOT using the WildcardQuery 
 i.e. only include results from the /news/ section
and its children.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1250) Some equals methods do not check for null argument

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera reassigned LUCENE-1250:
--

Assignee: Shai Erera

 Some equals methods do not check for null argument
 --

 Key: LUCENE-1250
 URL: https://issues.apache.org/jira/browse/LUCENE-1250
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index, Search
Reporter: David Dillard
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0


 The equals methods in the following classes do not check for a null argument 
 and thus would incorrectly fail with a null pointer exception if passed null:
 - org.apache.lucene.index.SegmentInfo
 - org.apache.lucene.search.function.CustomScoreQuery
 - org.apache.lucene.search.function.OrdFieldSource
 - org.apache.lucene.search.function.ReverseOrdFieldSource
 - org.apache.lucene.search.function.ValueSourceQuery
 If a null parameter is passed to equals() then false should be returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1250) Some equals methods do not check for null argument

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1250:
---

Lucene Fields: [New, Patch Available]  (was: [New])
Affects Version/s: (was: 2.3.2)
   (was: 2.3.1)
Fix Version/s: 4.0
   3.1

This is now only applicable to OrdFieldSource and ReverseOrdFieldSource. I'll 
fix both of them.

 Some equals methods do not check for null argument
 --

 Key: LUCENE-1250
 URL: https://issues.apache.org/jira/browse/LUCENE-1250
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index, Search
Reporter: David Dillard
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0


 The equals methods in the following classes do not check for a null argument 
 and thus would incorrectly fail with a null pointer exception if passed null:
 - org.apache.lucene.index.SegmentInfo
 - org.apache.lucene.search.function.CustomScoreQuery
 - org.apache.lucene.search.function.OrdFieldSource
 - org.apache.lucene.search.function.ReverseOrdFieldSource
 - org.apache.lucene.search.function.ValueSourceQuery
 If a null parameter is passed to equals() then false should be returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-901) DefaultSimilarity.queryNorm() should never return Infinity

2011-01-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-901.


   Resolution: Fixed
Fix Version/s: 3.1

This one is fixed (there is a Nan/Inf check in queryNorm added fairly recently)


 DefaultSimilarity.queryNorm() should never return Infinity
 --

 Key: LUCENE-901
 URL: https://issues.apache.org/jira/browse/LUCENE-901
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Michael Busch
Priority: Trivial
 Fix For: 3.1


 Currently DefaultSimilarity.queryNorm() returns Infinity if 
 sumOfSquaredWeights=0.
 This can result in a score of NaN (e. g. in TermScorer) if boost=0.0f.
 A simple fix would be to return 1.0f in case zero is passed in.
 See LUCENE-698 for discussions about this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1148) Create a new sub-class of SpanQuery to enable use of a RangeQuery within a SpanQuery

2011-01-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-1148.
-

   Resolution: Fixed
Fix Version/s: 3.1

This one is fixed by LUCENE-2754, you can just wrap a RangeQuery (or any other 
MultiTermQuery) as a SpanQuery

 Create a new sub-class of SpanQuery to enable use of a RangeQuery within a 
 SpanQuery
 

 Key: LUCENE-1148
 URL: https://issues.apache.org/jira/browse/LUCENE-1148
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4
Reporter: Michael Goddard
Priority: Minor
 Fix For: 3.1

 Attachments: span_range_query_01.24.2008.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Our users express queries using a syntax which enables them to embed various 
 query types within SpanQuery instances.  One feature they've been asking for 
 is the ability to embed a numeric range query so they could, for example, 
 find documents matching [2.0 2.75]MHz.  The attached patch adds the 
 capability and I hope others will find it useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-522) SpanFuzzyQuery

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-522:
-

Fix Version/s: 3.1

 SpanFuzzyQuery
 --

 Key: LUCENE-522
 URL: https://issues.apache.org/jira/browse/LUCENE-522
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 1.9
Reporter: Karl Wettin
Priority: Minor
 Fix For: 3.1


 This is my SpanFuzzyQuery. It is released under the Apache licensence. Just 
 paste it in.
 package se.snigel.lucene;
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.Term;
 import org.apache.lucene.search.*;
 import org.apache.lucene.search.spans.SpanOrQuery;
 import org.apache.lucene.search.spans.SpanQuery;
 import org.apache.lucene.search.spans.SpanTermQuery;
 import org.apache.lucene.search.spans.Spans;
 import java.io.IOException;
 import java.util.Collection;
 import java.util.LinkedList;
 /**
  * @author Karl Wettin ka...@snigel.net
  */
 public class SpanFuzzyQuery extends SpanQuery {
 public final static float defaultMinSimilarity = 0.7f;
 public final static int defaultPrefixLength = 0;
 private final Term term;
 private final float minimumSimilarity;
 private final int prefixLength;
 private BooleanQuery rewrittenFuzzyQuery;
 public SpanFuzzyQuery(Term term) {
 this(term, defaultMinSimilarity, defaultPrefixLength);
 }
 public SpanFuzzyQuery(Term term, float minimumSimilarity, int 
 prefixLength) {
 this.term = term;
 this.minimumSimilarity = minimumSimilarity;
 this.prefixLength = prefixLength;
 if (minimumSimilarity = 1.0f) {
 throw new IllegalArgumentException(minimumSimilarity = 1);
 } else if (minimumSimilarity  0.0f) {
 throw new IllegalArgumentException(minimumSimilarity  0);
 }
 if (prefixLength  0) {
 throw new IllegalArgumentException(prefixLength  0);
 }
 }
 public Query rewrite(IndexReader reader) throws IOException {
 FuzzyQuery fuzzyQuery = new FuzzyQuery(term, minimumSimilarity, 
 prefixLength);
 rewrittenFuzzyQuery = (BooleanQuery) fuzzyQuery.rewrite(reader);
 BooleanClause[] clauses = rewrittenFuzzyQuery.getClauses();
 SpanQuery[] spanQueries = new SpanQuery[clauses.length];
 for (int i = 0; i  clauses.length; i++) {
 BooleanClause clause = clauses[i];
 TermQuery termQuery = (TermQuery) clause.getQuery();
 spanQueries[i] = new SpanTermQuery(termQuery.getTerm());
 spanQueries[i].setBoost(termQuery.getBoost());
 }
 SpanOrQuery query = new SpanOrQuery(spanQueries);
 query.setBoost(fuzzyQuery.getBoost());
 return query;
 }
 /** Expert: Returns the matches for this query in an index.  Used 
 internally
  * to search for spans. */
 public Spans getSpans(IndexReader reader) throws IOException {
 throw new UnsupportedOperationException(Query should have been 
 rewritten);
 }
 /** Returns the name of the field matched by this query.*/
 public String getField() {
 return term.field();
 }
 /** Returns a collection of all terms matched by this query.*/
 public Collection getTerms() {
 if (rewrittenFuzzyQuery == null) {
 throw new RuntimeException(Query must be rewritten prior to 
 calling getTerms()!);
 } else {
 LinkedListTerm terms = new LinkedListTerm();
 BooleanClause[] clauses = rewrittenFuzzyQuery.getClauses();
 for (int i = 0; i  clauses.length; i++) {
 BooleanClause clause = clauses[i];
 TermQuery termQuery = (TermQuery) clause.getQuery();
 terms.add(termQuery.getTerm());
 }
 return terms;
 }
 }
 /** Prints a query to a string, with codefield/code as the default 
 field
  * for terms.  pThe representation used is one that is supposed to be 
 readable
  * by {@link org.apache.lucene.queryParser.QueryParser QueryParser}. 
 However,
  * there are the following limitations:
  * ul
  *  liIf the query was created by the parser, the printed
  *  representation may not be exactly what was parsed. For example,
  *  characters that need to be escaped will be represented without
  *  the required backslash./li
  * liSome of the more complicated queries (e.g. span queries)
  *  don't have a representation that can be parsed by QueryParser./li
  * /ul
  */
 public String toString(String field) {
 return spans( + rewrittenFuzzyQuery.toString() + );
 }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue 

[jira] Resolved: (LUCENE-943) ComparatorKey in Locale based sorting

2011-01-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-943.


Resolution: Fixed

This one is available as CollationKeyAnalyzer/ICUCollationKeyAnalyzer


 ComparatorKey in Locale based sorting
 -

 Key: LUCENE-943
 URL: https://issues.apache.org/jira/browse/LUCENE-943
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Ronnie Kolehmainen
Priority: Minor
 Attachments: LocaleBasedSortComparator.diff


 This is a reply/follow-up on Chris Hostetter's message on Lucene developers 
 list (aug 2006):
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200608.mbox/%3cpine.lnx.4.58.0608211050330.5...@hal.rescomp.berkeley.edu%3e
  perhaps it would be worthwhile for comparatorStringLocale to convert the 
  String[] it gets back from FieldCache.DEFAULT.getStrings to a new 
  CollationKey[]? or maybe even for FieldCache.DEFAULT.getStrings to be 
  deprecated, and replaced with a 
  FieldCache.DEFAULT.getCollationKeys(reader,field,Collator)?
 I think the best is to keep the default behavior as it is today. There is a 
 cost of building caches for sort fields which I think not everyone wants. 
 However for some international production environments there are indeed 
 possible performance gains in comparing precalculated keys instead of 
 comparing strings with rulebased collators.
 Since Lucene's Sort architecture is pluggable it is easy to create a custom 
 locale-based comparator, which utilizes the built-in caching/warming 
 mechanism of FieldCache, and may be used in SortField constructor.
 I'm not sure whether there should be classes for this in Lucene core or not, 
 but it could be nice to have the option of performance vs. memory consumption 
 in localized sorting without having to use additional jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1360) A Similarity class which has unique length norms for numTerms = 10

2011-01-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986378#action_12986378
 ] 

Robert Muir commented on LUCENE-1360:
-

Now that we have custom norm encoders, is this one obselete? 
you can just use SmallFloat.floatToByte52 to enc/dec your norms?

 A Similarity class which has unique length norms for numTerms = 10
 ---

 Key: LUCENE-1360
 URL: https://issues.apache.org/jira/browse/LUCENE-1360
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Sean Timm
Assignee: Otis Gospodnetic
Priority: Trivial
 Attachments: LUCENE-1380 visualization.pdf, 
 ShortFieldNormSimilarity.java


 A Similarity class which extends DefaultSimilarity and simply overrides 
 lengthNorm.  lengthNorm is implemented as a lookup for numTerms = 10, else 
 as {{1/sqrt(numTerms)}}. This is to avoid term counts below 11 from having 
 the same lengthNorm after stored as a single byte in the index.
 This is useful if your search is only on short fields such as titles or 
 product descriptions.
 See mailing list discussion: 
 http://www.nabble.com/How-to-boost-the-score-higher-in-case-user-query-matches-entire-field-value-than-just-some-words-within-a-field-td19079221.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1250) Some equals methods do not check for null argument

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-1250.


Resolution: Fixed

Committed revision 1063271 (3x).
Committed revision 1063272 (trunk).

Thanks David !

 Some equals methods do not check for null argument
 --

 Key: LUCENE-1250
 URL: https://issues.apache.org/jira/browse/LUCENE-1250
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index, Search
Reporter: David Dillard
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1, 4.0


 The equals methods in the following classes do not check for a null argument 
 and thus would incorrectly fail with a null pointer exception if passed null:
 - org.apache.lucene.index.SegmentInfo
 - org.apache.lucene.search.function.CustomScoreQuery
 - org.apache.lucene.search.function.OrdFieldSource
 - org.apache.lucene.search.function.ReverseOrdFieldSource
 - org.apache.lucene.search.function.ValueSourceQuery
 If a null parameter is passed to equals() then false should be returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1165) Reduce exposure of nightly build documentation

2011-01-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986385#action_12986385
 ] 

Uwe Schindler commented on LUCENE-1165:
---

This was once fixed by adding a robots.txt to Hudson. But since move of Hudson 
to new machines this is an issue again.

 Reduce exposure of nightly build documentation
 --

 Key: LUCENE-1165
 URL: https://issues.apache.org/jira/browse/LUCENE-1165
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
Reporter: Doron Cohen
Assignee: Uwe Schindler
Priority: Minor

 From LUCENE-1157  -
  ..the nightly build documentation is too prominent. A search for 
 indexwriter api on Google or Yahoo! returns nightly documentation before 
 released documentation.
 (https://issues.apache.org/jira/browse/LUCENE-1157?focusedCommentId=12565820#action_12565820)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1165) Reduce exposure of nightly build documentation

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1165:
--

Component/s: (was: Javadocs)
 Website
   Assignee: Uwe Schindler

 Reduce exposure of nightly build documentation
 --

 Key: LUCENE-1165
 URL: https://issues.apache.org/jira/browse/LUCENE-1165
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
Reporter: Doron Cohen
Assignee: Uwe Schindler
Priority: Minor

 From LUCENE-1157  -
  ..the nightly build documentation is too prominent. A search for 
 indexwriter api on Google or Yahoo! returns nightly documentation before 
 released documentation.
 (https://issues.apache.org/jira/browse/LUCENE-1157?focusedCommentId=12565820#action_12565820)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-83) ESCAPING BUG \(abc\) and \(a*c\) in v1.2

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-83.


Resolution: Not A Problem
  Assignee: (was: Lucene Developers)

I verified on both 3x and trunk, queries like \\(a?c\\) and \\(a*c\\) work 
(return the correct result). I guess it was a problem fixed already in QP at 
some point.

 ESCAPING BUG \(abc\) and \(a*c\) in v1.2
 

 Key: LUCENE-83
 URL: https://issues.apache.org/jira/browse/LUCENE-83
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 1.2
 Environment: Operating System: Windows XP
 Platform: All
Reporter: Lukas Zapletal
Priority: Minor

 PLEASE TEST THIS CODE:
 --
 import junit.framework.*;
 import org.apache.lucene.index.*;
 import org.apache.lucene.analysis.*;
 import org.apache.lucene.analysis.standard.*;
 import org.apache.lucene.store.*;
 import org.apache.lucene.document.*;
 import org.apache.lucene.search.*;
 import org.apache.lucene.queryParser.*;
 /**
  * Escape bug (now with same analyzers). By l...@root.cz.
  * Here is the description:
  *
  * When searching for \(abc\) everything is ok. But let`s search for: \(a?c\)
  * YES! Nothing found! It`s same with \ and maybe other escaped characters. 
  *
  * User: Lukas Zapletal
  * Date: Feb 1, 2003
  *
  * JUnit test case follows:
  */
 public class juEscapeBug extends TestCase {
 Directory dir = new RAMDirectory();
 String testText = This is a test. (abc) Is there a bug OR not? 
 \Question\!;
 public juEscapeBug(String tn) {
 super(tn);
 }
 protected void setUp() throws Exception {
 IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), 
 true);
 Document doc = new Document();
 doc.add(Field.Text(contents, testText));
 writer.addDocument(doc);
 writer.optimize();
 writer.close();
 }
 private boolean doQuery(String queryString) throws Exception {
 Searcher searcher = new IndexSearcher(dir);
 Analyzer analyzer = new StandardAnalyzer();
 Query query = QueryParser.parse(queryString, contents, analyzer);
 Hits hits = searcher.search(query);
 searcher.close();
 return (hits.length() == 1);
 }
 public void testBugOk1() throws Exception {
 assertTrue(doQuery(Test));
 }
 public void testBugOk2() throws Exception {
 assertFalse(doQuery(This is not there));
 }
 public void testBugOk3() throws Exception {
 assertTrue(doQuery(abc));
 }
 public void testBugOk4() throws Exception {
 assertTrue(doQuery(\\(abc\\)));
 }
 public void testBugHere1() throws Exception {
 assertTrue(doQuery(\\(a?c\\))); // BUG HERE !!!
 }
 public void testBugHere2() throws Exception {
 assertTrue(doQuery(\\(a*\\))); // BUG HERE !!!
 }
 public void testBugHere3() throws Exception {
 assertTrue(doQuery(\\\qu*on\\\)); // BUG HERE !!!
 }
 }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-507) CLONE -[PATCH] remove unused variables

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-507.
-

Resolution: Not A Problem
  Assignee: (was: Lucene Developers)

This is not a problem. First, many of the mentions in the patch file are 
irrelevant anymore, b/c this issue is old. Second, we're doing this sort of 
cleanup from to time, and those unused variables will keep popping in, and 
we'll keep cleaning them. So I see no reason to keep this issue open anymore.

 CLONE -[PATCH] remove unused variables
 --

 Key: LUCENE-507
 URL: https://issues.apache.org/jira/browse/LUCENE-507
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Steven Tamm
Priority: Minor
 Attachments: Unused.patch


 Seems I'm the only person who has the unused variable warning turned on in 
 Eclipse :-) This patch removes those unused variables and imports (for now 
 only in the search package). This doesn't introduce changes in 
 functionality, but it should be reviewed anyway: there might be cases where 
 the variables *should* be used, but they are not because of a bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1074) Workaround in Searcher.java for gcj bug#15411 no longer needed

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-1074.
--

Resolution: Not A Problem

Searcher is removed from trunk and deprecated in 3x. Also, I see the comment 
was removed from 3x, and the methods are still there. Given that this class is 
going away, and that this issue is way too old, I'll close it.

 Workaround in Searcher.java for gcj bug#15411 no longer needed
 --

 Key: LUCENE-1074
 URL: https://issues.apache.org/jira/browse/LUCENE-1074
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Attachments: LUCENE-1074.patch


 This gcj bug has meanwhile been fixed, see:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15411

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1165) Reduce exposure of nightly build documentation

2011-01-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986396#action_12986396
 ] 

Uwe Schindler commented on LUCENE-1165:
---

I opened INFRA-3389.

 Reduce exposure of nightly build documentation
 --

 Key: LUCENE-1165
 URL: https://issues.apache.org/jira/browse/LUCENE-1165
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
Reporter: Doron Cohen
Assignee: Uwe Schindler
Priority: Minor

 From LUCENE-1157  -
  ..the nightly build documentation is too prominent. A search for 
 indexwriter api on Google or Yahoo! returns nightly documentation before 
 released documentation.
 (https://issues.apache.org/jira/browse/LUCENE-1157?focusedCommentId=12565820#action_12565820)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1391) Token type and flags values get lost when using ShingleMatrixFilter

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1391:
--

Affects Version/s: 2.9
   3.0

This issue is still valid, ShingleMatrixFilter still sets its class name as 
type attribute for all tokens and resets flags to 0.

Furthermore, ShingleMatrixFilter does not respect custom/new attributes at all 
(like KeywordAttribute).

 Token type and flags values get lost when using ShingleMatrixFilter
 ---

 Key: LUCENE-1391
 URL: https://issues.apache.org/jira/browse/LUCENE-1391
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.4, 2.9, 3.0
Reporter: Wouter Heijke
Assignee: Karl Wettin
 Fix For: 3.1, 4.0


 While using the new ShingleMatrixFilter I noticed that a token's type and 
 flags get lost while using this filter. ShingleFilter does respect these 
 values like the other filters I know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1391) Token type and flags values get lost when using ShingleMatrixFilter

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1391:
--

Fix Version/s: 4.0
   3.1

 Token type and flags values get lost when using ShingleMatrixFilter
 ---

 Key: LUCENE-1391
 URL: https://issues.apache.org/jira/browse/LUCENE-1391
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.4, 2.9, 3.0
Reporter: Wouter Heijke
Assignee: Karl Wettin
 Fix For: 3.1, 4.0


 While using the new ShingleMatrixFilter I noticed that a token's type and 
 flags get lost while using this filter. ShingleFilter does respect these 
 values like the other filters I know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1391) Token type and flags values get lost when using ShingleMatrixFilter

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-1391:
-

Assignee: Uwe Schindler  (was: Karl Wettin)

 Token type and flags values get lost when using ShingleMatrixFilter
 ---

 Key: LUCENE-1391
 URL: https://issues.apache.org/jira/browse/LUCENE-1391
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.4, 2.9, 3.0
Reporter: Wouter Heijke
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0


 While using the new ShingleMatrixFilter I noticed that a token's type and 
 flags get lost while using this filter. ShingleFilter does respect these 
 values like the other filters I know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2326) Replication command indexversion fails to return index version

2011-01-25 Thread Eric Pugh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986402#action_12986402
 ] 

Eric Pugh commented on SOLR-2326:
-

So I did discover one odd thing.   If I don't have a /update update 
requesthandler listed in the solrconfig.xml, then the commitPoint is ALWAYS 
null, it's almost like having that in the stack causes the commitPoint to be 
done.

My other datapoint, that I think, but haven't verified is that if you don't 
have the replicate on startup set, then it *seems*, but I am not positive, to 
give that result.

One question I have is why is there that race condition?  I mean, if the 
command=details works, then shouldn't indexversion work the same, or raise  an 
error?  versus returning a rather unuseful 0?  Maybe just logging no 
commitPoint found would help.



 Replication command indexversion fails to return index version
 --

 Key: SOLR-2326
 URL: https://issues.apache.org/jira/browse/SOLR-2326
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
 Environment: Branch 3x latest
Reporter: Eric Pugh
Assignee: Mark Miller
 Fix For: 3.1


 To test this, I took the /example/multicore/core0 solrconfig and added a 
 simple replication handler:
   requestHandler name=/replication class=solr.ReplicationHandler 
   lst name=master
 str name=replicateAftercommit/str
 str name=replicateAfterstartup/str
 str name=confFilesschema.xml/str
   /lst
   /requestHandler
 When I query the handler for details I get back the indexVersion that I 
 expect: 
 http://localhost:8983/solr/core0/replication?command=detailswt=jsonindent=true
 But when I ask for just the indexVersion I get back a 0, which prevent the 
 slaves from pulling updates: 
 http://localhost:8983/solr/core0/replication?command=indexversionwt=jsonindent=true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1263) NullPointerException in java.util.Hashtable from executing a Query

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-1263.
--

Resolution: Cannot Reproduce

This problem could not be reproduced, and the person reporting it did not 
provide any information as to how to reproduce it since Nov-2008. Closing.

 NullPointerException in java.util.Hashtable from executing a Query
 --

 Key: LUCENE-1263
 URL: https://issues.apache.org/jira/browse/LUCENE-1263
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
Reporter: Benjamin Pasero
Priority: Minor

 Lately we are seeing this stacktrace showing up when executing a Query. Any 
 ideas?
 java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:482)
 at org.apache.lucene.index.MultiReader.norms(MultiReader.java:167)
 at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:72)
 at 
 org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:131)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:130)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:100)
 at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:192)
 at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:66)
 at org.apache.lucene.search.Hits.(Hits.java:45)
 at org.apache.lucene.search.Searcher.search(Searcher.java:45)
 at org.apache.lucene.search.Searcher.search(Searcher.java:37)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-487) Database as a lucene index target

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-487.
-

Resolution: Not A Problem

Not active since 2006 and we already have DBDirectory.

 Database as a lucene index target
 -

 Key: LUCENE-487
 URL: https://issues.apache.org/jira/browse/LUCENE-487
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Affects Versions: 1.9
 Environment: MySql (version 4.1 an up), Oracle (version 8.1.7 and up)
Reporter: Amir Kibbar
Priority: Minor
 Attachments: files.zip


 I've written an extension for the Directory object called DBDirectory, that 
 allows you to read and write a Lucene index to a database instead of a file 
 system.
 This is done using blobs. Each blob represents a file. Also, each blob has 
 a name which is equivalent to the filename and a prefix, which is equivalent 
 to a directory on a file system. This allows you to create multiple Lucene 
 indexes in a single database schema.
 The solution uses two tables:
 LUCENE_INDEX - which holds the index files as blobs
 LUCENE_LOCK - holds the different locks
 Attached is my proposed solution. This solution is still very basic, but it 
 does the job.
 The solution supports Oracle and mysql
 To use this solution:
 1. Place the files:
 - DBDirectory in src/java/org/apache/lucene/store
 - TestDBIndex in src/test/org/apache/lucene/index
 - objects-mysql.sql in src/db
 - objects-oracle.sql in src/db
 2. Edit the parameters for the database connection in TestDBIndex
 3. Create the database tables using the objects-mysql.sql script (assuming 
 you're using mysql)
 4. Build Lucene
 5. Run TestDBIndex with the database driver in the classpath
 I've tested the solution on mysql, but it *should* work on Oracle, I will 
 test that in a few days.
 Amir

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1263) NullPointerException in java.util.Hashtable from executing a Query

2011-01-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986419#action_12986419
 ] 

Uwe Schindler commented on LUCENE-1263:
---

The issue is definitely fixed:
- since 2.9 we do per-segment searches, so MultiReader's norms cache is no 
longer used
- and even before 2.9, at some time we changed the Hashtable to a HashMap that 
allowed null keys and null values.

 NullPointerException in java.util.Hashtable from executing a Query
 --

 Key: LUCENE-1263
 URL: https://issues.apache.org/jira/browse/LUCENE-1263
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
Reporter: Benjamin Pasero
Priority: Minor

 Lately we are seeing this stacktrace showing up when executing a Query. Any 
 ideas?
 java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:482)
 at org.apache.lucene.index.MultiReader.norms(MultiReader.java:167)
 at org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:72)
 at 
 org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:131)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:130)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:100)
 at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:192)
 at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:66)
 at org.apache.lucene.search.Hits.(Hits.java:45)
 at org.apache.lucene.search.Searcher.search(Searcher.java:45)
 at org.apache.lucene.search.Searcher.search(Searcher.java:37)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-856) Optimize segment merging

2011-01-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-856.
---

Resolution: Not A Problem

We've already made good improvements here, with stored fields  term vectors 
being bulk merged.

Postings are still costly to merge -- even on a fast machine I see merging CPU 
bound.  It's possible a codec could bulk-copy the postings, if eg there are no 
(or, not too many) deletions.  I think we can open separate issues in the 
future for that...

 Optimize segment merging
 

 Key: LUCENE-856
 URL: https://issues.apache.org/jira/browse/LUCENE-856
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor

 With LUCENE-843, the time spent indexing documents has been
 substantially reduced and now the time spent merging is a sizable
 portion of indexing time.
 I ran a test using the patch for LUCENE-843, building an index of 10
 million docs, each with ~5,500 byte plain text, with term vectors
 (positions + offsets) on and with 2 small stored fields per document.
 RAM buffer size was 32 MB.  I didn't optimize the index in the end,
 though optimize speed would also improve if we optimize segment
 merging.  Index size is 86 GB.
 Total time to build the index was 8 hrs 38 minutes, 5 hrs 40 minutes
 of which was spent merging.  That's 65.6% of the time!
 Most of this time is presumably IO which probably can't be reduced
 much unless we improve overall merge policy and experiment with values
 for mergeFactor / buffer size.
 These tests were run on a Mac Pro with 2 dual-core Intel CPUs.  The IO
 system is RAID 0 of 4 drives, so, these times are probably better than
 the more common case of a single hard drive which would likely be
 slower IO.
 I think there are some simple things we could do to speed up merging:
   * Experiment with buffer sizes -- maybe larger buffers for the
 IndexInputs used during merging could help?  Because at a default
 mergeFactor of 10, the disk heads must do alot of seeking back and
 forth between these 10 files (and then to the 11th file where we
 are writing).
   * Use byte copying when possible, eg if there are no deletions on a
 segment we can almost (I think?) just copy things like prox
 postings, stored fields, term vectors, instead of full parsing to
 Jave objects and then re-serializing them.
   * Experiment with mergeFactor / different merge policies.  For
 example I think LUCENE-854 would reduce time spend merging for a
 given index size.
 This is currently just a place to list ideas for optimizing segment
 merges.  I don't plan on working on this until after LUCENE-843.
 Note that for autoCommit=false, this optimization is somewhat less
 important, depending on how often you actually close/open a new
 IndexWriter.  In the extreme case, if you open a writer, add 100 MM
 docs, close the writer, then no segment merges happen at all.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-401) [PATCH] fixes for gcj target.

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-401.


Resolution: Fixed
  Assignee: (was: Lucene Developers)

Closing, because we no longer support GCJ.

 [PATCH] fixes for gcj target.
 -

 Key: LUCENE-401
 URL: https://issues.apache.org/jira/browse/LUCENE-401
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: unspecified
 Environment: Operating System: Linux
 Platform: Other
Reporter: Robert Newson
Priority: Minor
 Attachments: gcj.patch


 I've modified the Makefile so that it compiles with GCJ-4.0.
 This involved fixing the CORE_OBJ macro to match the generated jar file as 
 well
 as excluding FieldCacheImpl from being used from its .java source (GCJ has
 problems with anonymous inner classes, I guess).
 Also, I changed the behaviour of FieldInfos.fieldInfo(int). It depended on
 catching IndexOutOfBoundsException exception. I've modified it to test the
 bounds first, returning -1 in that case. This helps with gcj since we build 
 with
 -fno-bounds-check.
 I compiled with;
 GCJ=gcj-4.0 GCJH=gcjh-4.0 GPLUSPLUS=g++-4.0 ant clean gcj
 patch to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [DISCUSSION] Trunk and Stable release strategy

2011-01-25 Thread Grant Ingersoll
+1  Makes sense to me.

On Jan 24, 2011, at 4:07 AM, Shai Erera wrote:

 Hi
 
 Few days ago Robert and I discussed this matter over IRC and thought it's 
 something we should bring forward to the list. This issue arise due to recent 
 index format change introduced in LUCENE-2720, and the interesting question 
 was if we say 4.0 is required to read all 3x indexes, how would 4.0 support 
 a future version of 3x, that did not even exist when 4.0 was released.
 
 Trunk means the 'unstable' branch (today's 4.0) and Stable is today's 3.0, 
 but the same issue will arise after we make 4.0 Stable and 5.0 Trunk.
 
 After some discussion we came to a solution that we would like to propose to 
 the list: we continue to release 3x until we stabilize trunk. When we're 
 happy with trunk, we release it, say 4.0, and the last 3x release becomes the 
 bug fix release for 3x and from that point we maintain 4.0 (new features and 
 all, while maintaining API back-compat) and Trunk becomes the next big thing 
 (5.0).
 
 There won't be interleaving 4.0 and 3x releases and we won't reach the 
 situation where we released 4.0 and then release 3.2, w/ say index format 
 change (that we just had to make).
 
 While we can say 3x can be released after 4.0 w/ no index format changes 
 whatsoever, we think this proposal makes sense. There's no point maintaining 
 2 stable branches (3x and 4x) and an unstable Trunk.
 
 This will allow us to release 3x as frequent as we want, hold on w/ trunk as 
 much as we want, and at some point cut over to 4.0 and think about the next 
 big things we'd like to bring to Lucene.
 
 What do you think?
 
 Shai



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-505.


   Resolution: Fixed
Fix Version/s: 2.9

Since Lucene 2.9 we search on each segment separately, so MultiReader's norms 
cache would never be used, exept in custom code that calls norms() on the 
MultiReader/DirectoryReader. Since Lucene 4.0 this is also not allowed anymore, 
non-atomic readers don't support norms. If you still need to get global norms, 
you can use MultiNorms but that is discouraged.

See also: LUCENE-2771

 MultiReader.norm() takes up too much memory: norms byte[] should be made into 
 an Object
 ---

 Key: LUCENE-505
 URL: https://issues.apache.org/jira/browse/LUCENE-505
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.0.0
 Environment: Patch is against Lucene 1.9 trunk (as of Mar 1 06)
Reporter: Steven Tamm
Priority: Minor
 Fix For: 2.9

 Attachments: LazyNorms.patch, NormFactors.patch, NormFactors.patch, 
 NormFactors20.patch


 MultiReader.norms() is very inefficient: it has to construct a byte array 
 that's as long as all the documents in every segment.  This doubles the 
 memory requirement for scoring MultiReaders vs. Segment Readers.  Although 
 this is cached, it's still a baseline of memory that is unnecessary.
 The problem is that the Normalization Factors are passed around as a byte[].  
 If it were instead replaced with an Object, you could perform a whole host of 
 optimizations
 a.  When reading, you wouldn't have to construct a fakeNorms array of all 
 1.0fs.  You could instead return a singleton object that would just return 
 1.0f.
 b.  MultiReader could use an object that could delegate to NormFactors of the 
 subreaders
 c.  You could write an implementation that could use mmap to access the norm 
 factors.  Or if the index isn't long lived, you could use an implementation 
 that reads directly from the disk.
 The patch provided here replaces the use of byte[] with a new abstract class 
 called NormFactors.  
 NormFactors has two methods on it
 public abstract byte getByte(int doc) throws IOException;  // Returns the 
 byte[doc]
 public float getFactor(int doc) throws IOException;// Calls 
 Similarity.decodeNorm(getByte(doc))
 There are four implementations of this abstract class
 1.  NormFactors.EmptyNormFactors - This replaces the fakeNorms with a 
 singleton that only returns 1.0
 2.  NormFactors.ByteNormFactors - Converts a byte[] to a NormFactors for 
 backwards compatibility in constructors.
 3.  MultiNormFactors - Multiplexes the NormFactors in MultiReader to prevent 
 the need to construct the gigantic norms array.
 4.  SegmentReader.Norm - Same class, but now extends NormFactors to provide 
 the same access.
 In addition, Many of the Query and Scorer classes were changes to pass around 
 NormFactors instead of byte[], and to call getFactor() instead of using the 
 byte[].  I have kept around IndexReader.norms(String) for backwards 
 compatibiltiy, but marked it as deprecated.  I believe that the use of 
 ByteNormFactors in IndexReader.getNormFactors() will keep backward 
 compatibility with other IndexReader implementations, but I don't know how to 
 test that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-406) sort missing string fields last

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-406.
--

   Resolution: Fixed
Fix Version/s: 4.0

This is resolved / is being resolved by the new FieldCache deleted docs 
support: LUCENE-2671, LUCENE-2649

 sort missing string fields last
 ---

 Key: LUCENE-406
 URL: https://issues.apache.org/jira/browse/LUCENE-406
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 1.4
 Environment: Operating System: All
 Platform: All
Reporter: Yonik Seeley
Assignee: Hoss Man
Priority: Minor
 Fix For: 4.0

 Attachments: MissingStringLastComparatorSource.java, 
 MissingStringLastComparatorSource.java, 
 TestMissingStringLastComparatorSource.java


 A SortComparatorSource for string fields that orders documents with the sort
 field missing after documents with the field.  This is the reverse of the
 default Lucene implementation.
 The concept and first-pass implementation was done by Chris Hostetter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-770) CfsExtractor tool

2011-01-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986438#action_12986438
 ] 

Uwe Schindler commented on LUCENE-770:
--

In my opinion, this tool is not needed and does not really help, because it 
would not de-compound an index successfully.

The correct way to decompound is:
Create a new IndexWriter on a empty directory, set CFS to off and then use 
addIndexes(IndexReader...) to force a merge over to the new dir.

 CfsExtractor tool
 -

 Key: LUCENE-770
 URL: https://issues.apache.org/jira/browse/LUCENE-770
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.1
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: LUCENE-770.patch


 A tool for extracting the content of a CFS file, in order to go from a 
 compound index to a multi-file index.
 This may be handy for people who want to go back to multi-file index format 
 now that field norms are in a single file - LUCENE-756.
 Most of this code already existed and was hiding in IndexReader.main.
 I'll commit tomorrow, unless I hear otherwise.  I think I should also remove 
 IndexReader.main then.  Ja?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-770) CfsExtractor tool

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-770.
--

Resolution: Not A Problem

 CfsExtractor tool
 -

 Key: LUCENE-770
 URL: https://issues.apache.org/jira/browse/LUCENE-770
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.1
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: LUCENE-770.patch


 A tool for extracting the content of a CFS file, in order to go from a 
 compound index to a multi-file index.
 This may be handy for people who want to go back to multi-file index format 
 now that field norms are in a single file - LUCENE-756.
 Most of this code already existed and was hiding in IndexReader.main.
 I'll commit tomorrow, unless I hear otherwise.  I think I should also remove 
 IndexReader.main then.  Ja?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1418) QueryParser can throw NullPointerException during parsing of some queries in case if default field passed to constructor is null

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-1418.
--

Resolution: Not A Problem

I don't think QP should support 'null' passed as the default field, and I doubt 
if people really pass null as the default field. True, we can add a null check 
to the ctor, but due to long inactivity, I think it's not a problem people hit, 
so closing.

 QueryParser can throw NullPointerException during parsing of some queries in 
 case if default field passed to constructor is null
 

 Key: LUCENE-1418
 URL: https://issues.apache.org/jira/browse/LUCENE-1418
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 2.4
 Environment: CentOS 5.2 (probably any applies)
Reporter: Alexei Dets
Priority: Minor

 In case if QueryParser was constructed using QueryParser(String f,  Analyzer 
 a) constructor and f equals null then QueryParser can fail with 
 NullPointerException during parsing of some queries that _does_ contain field 
 name but have unbalanced parenthesis.
 Example 1:
 Query:  field:(expr1) expr2)
 Result:
 java.lang.NullPointerException
   at org.apache.lucene.index.Term.init(Term.java:50)
   at org.apache.lucene.index.Term.init(Term.java:36)
   at 
 org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543)
   at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1324)
   at 
 org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211)
   at 
 org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1168)
   at 
 org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128)
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170)
 Example2:
 Query:  field:(expr1) expr2)
 Result:
 java.lang.NullPointerException
   at org.apache.lucene.index.Term.init(Term.java:50)
   at org.apache.lucene.index.Term.init(Term.java:36)
   at 
 org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:543)
   at 
 org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:612)
   at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1459)
   at 
 org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1211)
   at 
 org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1168)
   at 
 org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1128)
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:170)
 Workaround: pass in constructor empty string as a default field name - in 
 this case QueryParser.parse method will throw ParseException (expected result 
 because query string is wrong) instead of NullPointerException.
 It is not obvious to me how to fix this so I'll describe my usecase, may be 
 I'm doing something completely wrong.
 Basically I have a set of per-field queries entered by user and need to 
 programmatically construct (after some preprocessing) one real Lucene query 
 combined from these user-entered per-field subqueries.
 To achieve this I basically do the following (simplified a bit):
 QueryParser parser = new QueryParser(null, analyzer); // I'll always provide 
 a field name in a query string as it is different each time and I don't have 
 any default
 BooleanQuery query = new BooleanQuery();
 Query subQuery1 = parser.parse(field1 + :( + queryString1 + ')');
 query.add(subQuery1, operator1); // operator = BooleanClause.Occur.MUST, 
 BooleanClause.Occur.MUST_NOT or BooleanClause.Occur.SHOULD
 Query subQuery2 = parser.parse(field2 + :( + queryString2 + ')');
 query.add(subQuery2, operator2); 
 Query subQuery3 = parser.parse(field3 + :( + queryString3 + ')');
 query.add(subQuery3, operator3); 
 ...
 IMHO either QueryParser constructor should be changed to throw 
 NullPointerException/InvalidArgumentException in case of null field passed 
 (and API documentation updated) or QueryParser.parse behavior should be fixed 
 to correctly throw ParseException instead of NullPointerException. Also IMHO 
 of a great help can be _public_ setField/getField methods of QueryParser 
 (that set/get field), this can help in use cases like my:
 QueryParser parser = new QueryParser(null, analyzer); // or add constructor 
 with analyzer _only_ for such cases
 BooleanQuery query = new BooleanQuery();
 parser.setField(field1);
 Query subQuery1 = parser.parse(queryString1);
 query.add(subQuery1, operator1);
 parser.setField(field2);
 Query subQuery2 = parser.parse(queryString2);
 query.add(subQuery2, operator2); 
 ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a 

[jira] Commented: (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2011-01-25 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986442#action_12986442
 ] 

Grant Ingersoll commented on LUCENE-2878:
-

I haven't looked at the patch, but one of the biggest issues with Spans is the 
duality within the spans themselves.  The whole point of spans is that you care 
about position information.  However, in order to get both the search results 
and the positions, you have to, effectively, execute the query twice, once to 
get the results and once to get the positions.  A Collector like interface, 
IMO, would be ideal because it would allow applications to leverage position 
information as the queries are being scored and hits being collected.  In other 
words, if we are rethinking how we handle position based queries, let's get it 
right this time and make it so it is actually useful for people who need the 
functionality.

As for PayloadSpanUtil, I think that was primarily put in to help w/ 
highlighting at the time, but if it has outlived it's usefulness, than dump it. 
 If we are consolidating all queries to support positions and payloads, then it 
shouldn't be needed, right?

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: Bulk Postings branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Attachments: LUCENE-2878.patch, LUCENE-2878.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-663) New feature rich higlighter for Lucene.

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-663.


   Resolution: Fixed
Fix Version/s: 2.9

Since Lucene 2.9 we have FastVectorHighlighter which uses TermVectors to 
highligt. Also the conventional Highlighter was extended to support more query 
types.

 New feature rich higlighter for Lucene.
 ---

 Key: LUCENE-663
 URL: https://issues.apache.org/jira/browse/LUCENE-663
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Karel Tejnora
Priority: Minor
 Fix For: 2.9

 Attachments: lucene-hlt-src.jar


 Well, I refactored (took) some code from two previous highlighters.
 This highlighter:
 + use TermPositionVector where available
 + use Analyzer if no TermPositionVector found or is forced to use it.
 + support for all lucene queries (Term, Phrase with slops, Prefix, Wildcard, 
 Range) except Fuzzy Query (can be implemented easly)
 - has no support for scoring (yet)
 - use same prefix,postfix for accepted terms (yet)
 ? It's written in Java5
 In next release I'd like to add support for Fuzzy, coloring f.e. diffrent 
 color for terms btw. phrase terms (slops), scoring of fragments
 It's apache licensed - I hope so :-) I put licene statement in every file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2888) Several DocsEnum / DocsAndPositionsEnum return wrong docID when next() / advance(int) return NO_MORE_DOCS

2011-01-25 Thread Simon Willnauer (JIRA)
Several DocsEnum / DocsAndPositionsEnum return wrong docID when next() / 
advance(int) return NO_MORE_DOCS
-

 Key: LUCENE-2888
 URL: https://issues.apache.org/jira/browse/LUCENE-2888
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0


During work on LUCENE-2878 I found some minor problems in PreFlex and Pulsing 
Codec - they are not returning NO_MORE_DOCS but the last docID instead from 
DocsEnum#docID() when next() or advance(int) returned NO_MORE_DOCS. The JavaDoc 
clearly says that it should return NO_MORE_DOCS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-753) Use NIO positional read to avoid synchronization in FSIndexInput

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-753.
--

Resolution: Fixed

This issue was resolved a long time ago, but left open for the stupid Windows 
Sun JRE bug which was never resolved. With Lucene 3.x and trunk we have better 
defaults (use e.g. MMapDirectory on Windows-64).

Users should default to FSDirectory.open() and use the returned directory for 
best performance.

 Use NIO positional read to avoid synchronization in FSIndexInput
 

 Key: LUCENE-753
 URL: https://issues.apache.org/jira/browse/LUCENE-753
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Attachments: FileReadTest.java, FileReadTest.java, FileReadTest.java, 
 FileReadTest.java, FileReadTest.java, FileReadTest.java, FileReadTest.java, 
 FileReadTest.java, FSDirectoryPool.patch, FSIndexInput.patch, 
 FSIndexInput.patch, LUCENE-753.patch, LUCENE-753.patch, LUCENE-753.patch, 
 LUCENE-753.patch, LUCENE-753.patch, lucene-753.patch, lucene-753.patch


 As suggested by Doug, we could use NIO pread to avoid synchronization on the 
 underlying file.
 This could mitigate any MT performance drop caused by reducing the number of 
 files in the index format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-72) [PATCH] Query parser inconsistency when using terms to exclude.

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-72.


Resolution: Won't Fix
  Assignee: (was: Lucene Developers)

As per the discussion, this should have been closed long time ago.

 [PATCH] Query parser inconsistency when using terms to exclude.
 ---

 Key: LUCENE-72
 URL: https://issues.apache.org/jira/browse/LUCENE-72
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 1.2
 Environment: Operating System: All
 Platform: PC
Reporter: Carlos
Priority: Minor
 Attachments: patch6.txt, patch7.txt, TestRegressionLucene72.java, 
 TestRegressionLucene72.java


 Hi.
 The problem I am having occurs when using queryparser and also when building 
 the
 query using the API.
 Assume that we want to look for documents about fruits or vegetables but 
 excluding tomatoes and bananas. I suppose the right query sould be:
 +(fruits vegetables) AND (-tomatoes -bananas)
 wich I think is equivalent to (if tou parse it and then print the 
 query.toString
 () result that is what you get)
 +(fruits vegetables) +(-tomatoes -bananas)
 but the query doesn't work as expected, in fact the query that works is
 +(fruits vegetables) -(-tomatoes -bananas)
 which doesn´t really make much sense, because the second part seems to say:
 All documents where the condition tomatoes is not present and bananas is not 
 present  is false, which means the opposite.
 In fact, second query works as (even if they look quite opposite):
 +(fruits vegetables) -tomatoes -bananas
 Hope someone could help, thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2888) Several DocsEnum / DocsAndPositionsEnum return wrong docID when next() / advance(int) return NO_MORE_DOCS

2011-01-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2888:


Attachment: LUCENE-2888.patch

here is a patch including the ported testcase from LUCENE-2878

 Several DocsEnum / DocsAndPositionsEnum return wrong docID when next() / 
 advance(int) return NO_MORE_DOCS
 -

 Key: LUCENE-2888
 URL: https://issues.apache.org/jira/browse/LUCENE-2888
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2888.patch


 During work on LUCENE-2878 I found some minor problems in PreFlex and Pulsing 
 Codec - they are not returning NO_MORE_DOCS but the last docID instead from 
 DocsEnum#docID() when next() or advance(int) returned NO_MORE_DOCS. The 
 JavaDoc clearly says that it should return NO_MORE_DOCS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-25 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-445:


Assignee: Grant Ingersoll  (was: Erick Erickson)

 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Grant Ingersoll
 Fix For: Next

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, solr-445.xml, SOLR-445_3x.patch


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-855.
-

Resolution: Duplicate

We already have FieldCacheRangeFilter (introduced in LUCENE-1461), so closing 
as duplicate.

 MemoryCachedRangeFilter to boost performance of Range queries
 -

 Key: LUCENE-855
 URL: https://issues.apache.org/jira/browse/LUCENE-855
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.1
Reporter: Andy Liu
 Attachments: contrib-filters.tar.gz, FieldCacheRangeFilter.patch, 
 FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, 
 FieldCacheRangeFilter.patch, FieldCacheRangeFilter.patch, 
 FieldCacheRangeFilter.patch, FieldCacheRangeFilter_Lucene_2.3.0.patch, 
 MemoryCachedRangeFilter.patch, MemoryCachedRangeFilter_1.4.patch, 
 TestRangeFilterPerformanceComparison.java, 
 TestRangeFilterPerformanceComparison.java


 Currently RangeFilter uses TermEnum and TermDocs to find documents that fall 
 within the specified range.  This requires iterating through every single 
 term in the index and can get rather slow for large document sets.
 MemoryCachedRangeFilter reads all docId, value pairs of a given field, 
 sorts by value, and stores in a SortedFieldCache.  During bits(), binary 
 searches are used to find the start and end indices of the lower and upper 
 bound values.  The BitSet is populated by all the docId values that fall in 
 between the start and end indices.
 TestMemoryCachedRangeFilterPerformance creates a 100K RAMDirectory-backed 
 index with random date values within a 5 year range.  Executing bits() 1000 
 times on standard RangeQuery using random date intervals took 63904ms.  Using 
 MemoryCachedRangeFilter, it took 876ms.  Performance increase is less 
 dramatic when you have less unique terms in a field or using less number of 
 documents.
 Currently MemoryCachedRangeFilter only works with numeric values (values are 
 stored in a long[] array) but it can be easily changed to support Strings.  A 
 side benefit of storing the values are stored as longs, is that there's no 
 longer the need to make the values lexographically comparable, i.e. padding 
 numeric values with zeros.
 The downside of using MemoryCachedRangeFilter is there's a fairly significant 
 memory requirement.  So it's designed to be used in situations where range 
 filter performance is critical and memory consumption is not an issue.  The 
 memory requirements are: (sizeof(int) + sizeof(long)) * numDocs.  
 MemoryCachedRangeFilter also requires a warmup step which can take a while to 
 run in large datasets (it took 40s to run on a 3M document corpus).  Warmup 
 can be called explicitly or is automatically called the first time 
 MemoryCachedRangeFilter is applied using a given field.
 So in summery, MemoryCachedRangeFilter can be useful when:
 - Performance is critical
 - Memory is not an issue
 - Field contains many unique numeric values
 - Index contains large amount of documents

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-320) [PATCH] Increases visibility of methods/classes from protected/package level to public

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-320.
-

Resolution: Not A Problem
  Assignee: (was: Lucene Developers)

This API is already public, so I don't think there's a problem anymore.

 [PATCH] Increases visibility of methods/classes from protected/package level 
 to public
 --

 Key: LUCENE-320
 URL: https://issues.apache.org/jira/browse/LUCENE-320
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: CVS Nightly - Specify date in submission
 Environment: Operating System: All
 Platform: All
Reporter: Alexey Panchenko
Priority: Minor
 Attachments: lucene-more-public.patch


 I am building a Query implementation which should match documents that are
 matched by specified number of subqueries. It works very much the same as
 BooleanQuery, but checks the number of matched subqueries which should be
 greater than or equal to the specified value.
 The patch is needed to allow access to these classes/members from other
 packages, not just org.apache.lucene.search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-988) Benchmarker tasks for the TPB data collection

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-988.
-

Resolution: Not A Problem

Closing because I'm not sure what's the license level of The Pirate Bay DB 
and also not sure that we want to have such DB in Lucene. Benchmark's API 
allows for someone to write a ContentSource which reads whatever source he 
wants, and convert it to DocData that is later fed and index by DocMaker.

 Benchmarker tasks for the TPB data collection
 -

 Key: LUCENE-988
 URL: https://issues.apache.org/jira/browse/LUCENE-988
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/benchmark
Affects Versions: 2.3
Reporter: Karl Wettin
Priority: Trivial
 Attachments: LUCENE-988.txt


 Very simple DocMaker and QueryMaker for the TPB data collection (~150,000 
 content items, ~500,000 comments to the contents and ~3,700,000 user queries).
 URL to dataset:
 http://thepiratebay.org/tor/3783572/db_dump_and_query_log_from_piratebay.org__summer_of_2006

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2011-01-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986457#action_12986457
 ] 

Simon Willnauer commented on LUCENE-2878:
-

{quote}
I haven't looked at the patch, but one of the biggest issues with Spans is the 
duality within the spans themselves. The whole point of spans is that you care 
about position information. However, in order to get both the search results 
and the positions, you have to, effectively, execute the query twice, once to 
get the results and once to get the positions. A Collector like interface, IMO, 
would be ideal because it would allow applications to leverage position 
information as the queries are being scored and hits being collected. In other 
words, if we are rethinking how we handle position based queries, let's get it 
right this time and make it so it is actually useful for people who need the 
functionality.
{quote}

Grant I completely agree! Any help here very much welcome. I am so busy fixing 
all the BulkEnums and spinnoffs from this issue but I hope I have a first 
sketch of how I think this should work by the end of the week!


bq. As for PayloadSpanUtil, I think that was primarily put in to help w/ 
highlighting at the time, but if it has outlived it's usefulness, than dump it. 
If we are consolidating all queries to support positions and payloads, then it 
shouldn't be needed, right?

Yeah!



 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: Bulk Postings branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Attachments: LUCENE-2878.patch, LUCENE-2878.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-666) TERM1 OR NOT TERM2 does not perform as expected

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-666.
--

Resolution: Not A Problem

This is not a problem of QueryParser, its more a problem of the combination of 
SHOULD and MUST_NOT clauses in a single BooleanQuery. The first clause must be 
required to have the wanted effect.

To prevent such a thing at all, I would tend to disallow MUST/SHOULD clauses in 
a BooleanQuery. No need to add ParseExceptions to QueryParser as the same 
problem would also happen to users constructing BooleanQuery programatically.

 TERM1 OR NOT TERM2 does not perform as expected
 ---

 Key: LUCENE-666
 URL: https://issues.apache.org/jira/browse/LUCENE-666
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 2.0.0
 Environment: Windows XP, JavaCC 4.0, JDK 1.5
Reporter: Dejan Nenov
 Attachments: TestAornotB.java


 test:
 [junit] Testsuite: org.apache.lucene.search.TestAornotB
 [junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 0.39 sec
 [junit] - Standard Output ---
 [junit] Doc1 = A B C
 [junit] Doc2 = A B C D
 [junit] Doc3 = A   C D
 [junit] Doc4 =   B C D
 [junit] Doc5 = C D
 [junit] -
 [junit] With query A OR NOT B we expect to hit
 [junit] all documents EXCEPT Doc4, instead we only match on Doc3.
 [junit] While LUCENE currently explicitly does not support queries of
 [junit] the type find docs that do not contain TERM - this explains
 [junit] not finding Doc5, but does not justify elimnating Doc1 and Doc2
 [junit] -
 [junit]  the fix shoould likely require a modification to QueryParser.jj
 [junit]  around the method:
 [junit]  protected void addClause(Vector clauses, int conj, int mods, 
 Query q)
 [junit] Query:c:a -c:b hits.length=1
 [junit] Query Found:Doc[0]= A C D
 [junit] 0.0 = (NON-MATCH) Failure to meet condition(s) of 
 required/prohibited clause(s)
 [junit]   0.6115718 = (MATCH) fieldWeight(c:a in 1), product of:
 [junit] 1.0 = tf(termFreq(c:a)=1)
 [junit] 1.2231436 = idf(docFreq=3)
 [junit] 0.5 = fieldNorm(field=c, doc=1)
 [junit]   0.0 = match on prohibited clause (c:b)
 [junit] 0.6115718 = (MATCH) fieldWeight(c:b in 1), product of:
 [junit]   1.0 = tf(termFreq(c:b)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=1)
 [junit] 0.6115718 = (MATCH) sum of:
 [junit]   0.6115718 = (MATCH) fieldWeight(c:a in 2), product of:
 [junit] 1.0 = tf(termFreq(c:a)=1)
 [junit] 1.2231436 = idf(docFreq=3)
 [junit] 0.5 = fieldNorm(field=c, doc=2)
 [junit] 0.0 = (NON-MATCH) Failure to meet condition(s) of 
 required/prohibited clause(s)
 [junit]   0.0 = match on prohibited clause (c:b)
 [junit] 0.6115718 = (MATCH) fieldWeight(c:b in 3), product of:
 [junit]   1.0 = tf(termFreq(c:b)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=3)
 [junit] Query:c:a (-c:b) hits.length=3
 [junit] Query Found:Doc[0]= A B C
 [junit] Query Found:Doc[1]= A B C D
 [junit] Query Found:Doc[2]= A C D
 [junit] 0.3057859 = (MATCH) product of:
 [junit]   0.6115718 = (MATCH) sum of:
 [junit] 0.6115718 = (MATCH) fieldWeight(c:a in 1), product of:
 [junit]   1.0 = tf(termFreq(c:a)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=1)
 [junit]   0.5 = coord(1/2)
 [junit] 0.3057859 = (MATCH) product of:
 [junit]   0.6115718 = (MATCH) sum of:
 [junit] 0.6115718 = (MATCH) fieldWeight(c:a in 2), product of:
 [junit]   1.0 = tf(termFreq(c:a)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=2)
 [junit]   0.5 = coord(1/2)
 [junit] 0.0 = (NON-MATCH) product of:
 [junit]   0.0 = (NON-MATCH) sum of:
 [junit]   0.0 = coord(0/2)
 [junit] -  ---
 [junit] Testcase: testFAIL(org.apache.lucene.search.TestAornotB):   FAILED
 [junit] resultDocs =A C D expected:3 but was:1
 [junit] junit.framework.AssertionFailedError: resultDocs =A C D 
 expected:3 but was:1
 [junit] at 
 org.apache.lucene.search.TestAornotB.testFAIL(TestAornotB.java:137)
 [junit] Test org.apache.lucene.search.TestAornotB FAILED

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: 

[jira] Updated: (LUCENE-666) TERM1 OR NOT TERM2 does not perform as expected

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-666:
-

Comment: was deleted

(was: This is not a problem of QueryParser, its more a problem of the 
combination of SHOULD and MUST_NOT clauses in a single BooleanQuery. The first 
clause must be required to have the wanted effect.

To prevent such a thing at all, I would tend to disallow MUST/SHOULD clauses in 
a BooleanQuery. No need to add ParseExceptions to QueryParser as the same 
problem would also happen to users constructing BooleanQuery programatically.)

 TERM1 OR NOT TERM2 does not perform as expected
 ---

 Key: LUCENE-666
 URL: https://issues.apache.org/jira/browse/LUCENE-666
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 2.0.0
 Environment: Windows XP, JavaCC 4.0, JDK 1.5
Reporter: Dejan Nenov
 Attachments: TestAornotB.java


 test:
 [junit] Testsuite: org.apache.lucene.search.TestAornotB
 [junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 0.39 sec
 [junit] - Standard Output ---
 [junit] Doc1 = A B C
 [junit] Doc2 = A B C D
 [junit] Doc3 = A   C D
 [junit] Doc4 =   B C D
 [junit] Doc5 = C D
 [junit] -
 [junit] With query A OR NOT B we expect to hit
 [junit] all documents EXCEPT Doc4, instead we only match on Doc3.
 [junit] While LUCENE currently explicitly does not support queries of
 [junit] the type find docs that do not contain TERM - this explains
 [junit] not finding Doc5, but does not justify elimnating Doc1 and Doc2
 [junit] -
 [junit]  the fix shoould likely require a modification to QueryParser.jj
 [junit]  around the method:
 [junit]  protected void addClause(Vector clauses, int conj, int mods, 
 Query q)
 [junit] Query:c:a -c:b hits.length=1
 [junit] Query Found:Doc[0]= A C D
 [junit] 0.0 = (NON-MATCH) Failure to meet condition(s) of 
 required/prohibited clause(s)
 [junit]   0.6115718 = (MATCH) fieldWeight(c:a in 1), product of:
 [junit] 1.0 = tf(termFreq(c:a)=1)
 [junit] 1.2231436 = idf(docFreq=3)
 [junit] 0.5 = fieldNorm(field=c, doc=1)
 [junit]   0.0 = match on prohibited clause (c:b)
 [junit] 0.6115718 = (MATCH) fieldWeight(c:b in 1), product of:
 [junit]   1.0 = tf(termFreq(c:b)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=1)
 [junit] 0.6115718 = (MATCH) sum of:
 [junit]   0.6115718 = (MATCH) fieldWeight(c:a in 2), product of:
 [junit] 1.0 = tf(termFreq(c:a)=1)
 [junit] 1.2231436 = idf(docFreq=3)
 [junit] 0.5 = fieldNorm(field=c, doc=2)
 [junit] 0.0 = (NON-MATCH) Failure to meet condition(s) of 
 required/prohibited clause(s)
 [junit]   0.0 = match on prohibited clause (c:b)
 [junit] 0.6115718 = (MATCH) fieldWeight(c:b in 3), product of:
 [junit]   1.0 = tf(termFreq(c:b)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=3)
 [junit] Query:c:a (-c:b) hits.length=3
 [junit] Query Found:Doc[0]= A B C
 [junit] Query Found:Doc[1]= A B C D
 [junit] Query Found:Doc[2]= A C D
 [junit] 0.3057859 = (MATCH) product of:
 [junit]   0.6115718 = (MATCH) sum of:
 [junit] 0.6115718 = (MATCH) fieldWeight(c:a in 1), product of:
 [junit]   1.0 = tf(termFreq(c:a)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=1)
 [junit]   0.5 = coord(1/2)
 [junit] 0.3057859 = (MATCH) product of:
 [junit]   0.6115718 = (MATCH) sum of:
 [junit] 0.6115718 = (MATCH) fieldWeight(c:a in 2), product of:
 [junit]   1.0 = tf(termFreq(c:a)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=2)
 [junit]   0.5 = coord(1/2)
 [junit] 0.0 = (NON-MATCH) product of:
 [junit]   0.0 = (NON-MATCH) sum of:
 [junit]   0.0 = coord(0/2)
 [junit] -  ---
 [junit] Testcase: testFAIL(org.apache.lucene.search.TestAornotB):   FAILED
 [junit] resultDocs =A C D expected:3 but was:1
 [junit] junit.framework.AssertionFailedError: resultDocs =A C D 
 expected:3 but was:1
 [junit] at 
 org.apache.lucene.search.TestAornotB.testFAIL(TestAornotB.java:137)
 [junit] Test org.apache.lucene.search.TestAornotB FAILED

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: 

[jira] Reopened: (LUCENE-666) TERM1 OR NOT TERM2 does not perform as expected

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-666:
--


Sorry, misunderstood the issue!

 TERM1 OR NOT TERM2 does not perform as expected
 ---

 Key: LUCENE-666
 URL: https://issues.apache.org/jira/browse/LUCENE-666
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 2.0.0
 Environment: Windows XP, JavaCC 4.0, JDK 1.5
Reporter: Dejan Nenov
 Attachments: TestAornotB.java


 test:
 [junit] Testsuite: org.apache.lucene.search.TestAornotB
 [junit] Tests run: 3, Failures: 1, Errors: 0, Time elapsed: 0.39 sec
 [junit] - Standard Output ---
 [junit] Doc1 = A B C
 [junit] Doc2 = A B C D
 [junit] Doc3 = A   C D
 [junit] Doc4 =   B C D
 [junit] Doc5 = C D
 [junit] -
 [junit] With query A OR NOT B we expect to hit
 [junit] all documents EXCEPT Doc4, instead we only match on Doc3.
 [junit] While LUCENE currently explicitly does not support queries of
 [junit] the type find docs that do not contain TERM - this explains
 [junit] not finding Doc5, but does not justify elimnating Doc1 and Doc2
 [junit] -
 [junit]  the fix shoould likely require a modification to QueryParser.jj
 [junit]  around the method:
 [junit]  protected void addClause(Vector clauses, int conj, int mods, 
 Query q)
 [junit] Query:c:a -c:b hits.length=1
 [junit] Query Found:Doc[0]= A C D
 [junit] 0.0 = (NON-MATCH) Failure to meet condition(s) of 
 required/prohibited clause(s)
 [junit]   0.6115718 = (MATCH) fieldWeight(c:a in 1), product of:
 [junit] 1.0 = tf(termFreq(c:a)=1)
 [junit] 1.2231436 = idf(docFreq=3)
 [junit] 0.5 = fieldNorm(field=c, doc=1)
 [junit]   0.0 = match on prohibited clause (c:b)
 [junit] 0.6115718 = (MATCH) fieldWeight(c:b in 1), product of:
 [junit]   1.0 = tf(termFreq(c:b)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=1)
 [junit] 0.6115718 = (MATCH) sum of:
 [junit]   0.6115718 = (MATCH) fieldWeight(c:a in 2), product of:
 [junit] 1.0 = tf(termFreq(c:a)=1)
 [junit] 1.2231436 = idf(docFreq=3)
 [junit] 0.5 = fieldNorm(field=c, doc=2)
 [junit] 0.0 = (NON-MATCH) Failure to meet condition(s) of 
 required/prohibited clause(s)
 [junit]   0.0 = match on prohibited clause (c:b)
 [junit] 0.6115718 = (MATCH) fieldWeight(c:b in 3), product of:
 [junit]   1.0 = tf(termFreq(c:b)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=3)
 [junit] Query:c:a (-c:b) hits.length=3
 [junit] Query Found:Doc[0]= A B C
 [junit] Query Found:Doc[1]= A B C D
 [junit] Query Found:Doc[2]= A C D
 [junit] 0.3057859 = (MATCH) product of:
 [junit]   0.6115718 = (MATCH) sum of:
 [junit] 0.6115718 = (MATCH) fieldWeight(c:a in 1), product of:
 [junit]   1.0 = tf(termFreq(c:a)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=1)
 [junit]   0.5 = coord(1/2)
 [junit] 0.3057859 = (MATCH) product of:
 [junit]   0.6115718 = (MATCH) sum of:
 [junit] 0.6115718 = (MATCH) fieldWeight(c:a in 2), product of:
 [junit]   1.0 = tf(termFreq(c:a)=1)
 [junit]   1.2231436 = idf(docFreq=3)
 [junit]   0.5 = fieldNorm(field=c, doc=2)
 [junit]   0.5 = coord(1/2)
 [junit] 0.0 = (NON-MATCH) product of:
 [junit]   0.0 = (NON-MATCH) sum of:
 [junit]   0.0 = coord(0/2)
 [junit] -  ---
 [junit] Testcase: testFAIL(org.apache.lucene.search.TestAornotB):   FAILED
 [junit] resultDocs =A C D expected:3 but was:1
 [junit] junit.framework.AssertionFailedError: resultDocs =A C D 
 expected:3 but was:1
 [junit] at 
 org.apache.lucene.search.TestAornotB.testFAIL(TestAornotB.java:137)
 [junit] Test org.apache.lucene.search.TestAornotB FAILED

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-547) Directory implementation for Applets

2011-01-25 Thread Andre Schild (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986465#action_12986465
 ] 

Andre Schild commented on LUCENE-547:
-

The reason for this implementation is the following:

We have built a QM documentation system which 
generates static PDF and html pages, with a tree navigation.

It also generates a fulltext lucene index to be able to
do a full text search.

We don't require a server to deliver the content, but instead
we can just start the documentation system from a local harddisk,
or even a CDROM drive.

So, since we don't have a server on hand, we can't use REST.

 Directory implementation for Applets
 

 Key: LUCENE-547
 URL: https://issues.apache.org/jira/browse/LUCENE-547
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 1.9
 Environment: Applets
Reporter: Andre Schild
Priority: Minor
 Attachments: AppletDirectory.zip


 This directory implementation can be used inside of applets, where the index 
 files are located on the server.
 Also teh applet is not required to be signed, as no calls to the 
 System.getProperty are made.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1155) BoostingTermQuery#defaultTermBoost

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-1155.
--

Resolution: Won't Fix

We don't have BoostingTermQuery anymore, and there was never consensus here to 
fix it within Lucene, vs. e.g. the workarounds Grant proposed. Given that, and 
the fact that the issue is inactive since Sep-2008, and that today we give 
enough API for someone to write this sort of capability in his application, I'm 
closing the issue.

 BoostingTermQuery#defaultTermBoost
 --

 Key: LUCENE-1155
 URL: https://issues.apache.org/jira/browse/LUCENE-1155
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Karl Wettin
Priority: Trivial

 This patch allows a null payload to mean something different than 1f.
 (I have this use case where 99% of my tokens share the same rather large 
 token position payload boost.)
 {code}
 Index: src/java/org/apache/lucene/search/payloads/BoostingTermQuery.java
 ===
 --- src/java/org/apache/lucene/search/payloads/BoostingTermQuery.java   
 (revision 615215)
 +++ src/java/org/apache/lucene/search/payloads/BoostingTermQuery.java   
 (working copy)
 @@ -41,11 +41,16 @@
   */
  public class BoostingTermQuery extends SpanTermQuery{
  
 +  private Float defaultTermBoost = null;
  
public BoostingTermQuery(Term term) {
  super(term);
}
  
 +  public BoostingTermQuery(Term term, Float defaultTermBoost) {
 +super(term);
 +this.defaultTermBoost = defaultTermBoost;
 +  }
  
protected Weight createWeight(Searcher searcher) throws IOException {
  return new BoostingTermWeight(this, searcher);
 @@ -107,7 +112,9 @@
payload = positions.getPayload(payload, 0);
payloadScore += similarity.scorePayload(term.field(), payload, 0, 
 positions.getPayloadLength());
payloadsSeen++;
 -
 +} else if (defaultTermBoost != null) {
 +  payloadScore += defaultTermBoost;
 +  payloadsSeen++;
  } else {
//zero out the payload?
  }
 @@ -146,7 +153,14 @@
  
}
  
 +  public Float getDefaultTermBoost() {
 +return defaultTermBoost;
 +  }
  
 +  public void setDefaultTermBoost(Float defaultTermBoost) {
 +this.defaultTermBoost = defaultTermBoost;
 +  }
 +
public boolean equals(Object o) {
  if (!(o instanceof BoostingTermQuery))
return false;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-547) Directory implementation for Applets

2011-01-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986471#action_12986471
 ] 

Shai Erera commented on LUCENE-547:
---

If you don't have a server, where does the Directory take its files from? If 
it's from the local hard-disk, you can use RAMDirectory to load the files from 
a FSDirectory.

 Directory implementation for Applets
 

 Key: LUCENE-547
 URL: https://issues.apache.org/jira/browse/LUCENE-547
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 1.9
 Environment: Applets
Reporter: Andre Schild
Priority: Minor
 Attachments: AppletDirectory.zip


 This directory implementation can be used inside of applets, where the index 
 files are located on the server.
 Also teh applet is not required to be signed, as no calls to the 
 System.getProperty are made.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2888) Several DocsEnum / DocsAndPositionsEnum return wrong docID when next() / advance(int) return NO_MORE_DOCS

2011-01-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2888:


Attachment: LUCENE-2888.patch

there was a wrong assignment in the last patch... I will go ahead and commit 
that one soon

 Several DocsEnum / DocsAndPositionsEnum return wrong docID when next() / 
 advance(int) return NO_MORE_DOCS
 -

 Key: LUCENE-2888
 URL: https://issues.apache.org/jira/browse/LUCENE-2888
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2888.patch, LUCENE-2888.patch


 During work on LUCENE-2878 I found some minor problems in PreFlex and Pulsing 
 Codec - they are not returning NO_MORE_DOCS but the last docID instead from 
 DocsEnum#docID() when next() or advance(int) returned NO_MORE_DOCS. The 
 JavaDoc clearly says that it should return NO_MORE_DOCS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-798) Factory for RangeFilters that caches sections of ranges to reduce disk reads

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-798.


Resolution: Not A Problem

This patch does not apply anymore, as Filters no longer use BitSets, but 
DocIdSets. Also this issue is solved by NumericRangeQuery, NumericRangeFilter, 
FieldCacheRangeFilter - one of these classes should meet your requirements.

 Factory for RangeFilters that caches sections of ranges to reduce disk reads
 

 Key: LUCENE-798
 URL: https://issues.apache.org/jira/browse/LUCENE-798
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Mark Harwood
 Attachments: CachedRangesFilterFactory.java


 RangeFilters can be cached using CachingWrapperFilter but are only re-used if 
 a user happens to use *exactly* the same upper/lower bounds.
 This class demonstrates a caching approach where *sections* of ranges are 
 cached as bitsets and these are re-used/combined to construct large range 
 filters if they fall within the required range. This can improve the cache 
 hit ratio and avoid going to disk to read large lists of Doc ids from 
 TermDocs.
 This class needs some more work to add thread safety but I'm making it 
 available to gather feedback on the design at this early stage before making 
 robust.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2888) Several DocsEnum / DocsAndPositionsEnum return wrong docID when next() / advance(int) return NO_MORE_DOCS

2011-01-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-2888.
-

Resolution: Fixed

Committed revision 1063332.


 Several DocsEnum / DocsAndPositionsEnum return wrong docID when next() / 
 advance(int) return NO_MORE_DOCS
 -

 Key: LUCENE-2888
 URL: https://issues.apache.org/jira/browse/LUCENE-2888
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-2888.patch, LUCENE-2888.patch


 During work on LUCENE-2878 I found some minor problems in PreFlex and Pulsing 
 Codec - they are not returning NO_MORE_DOCS but the last docID instead from 
 DocsEnum#docID() when next() or advance(int) returned NO_MORE_DOCS. The 
 JavaDoc clearly says that it should return NO_MORE_DOCS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-482) Error handling in CSVLoader

2011-01-25 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-482:
-

Attachment: SOLR-482.patch

Working through some old patches, this one is pretty tame and gives a little 
more info when an error is encountered than it used to.  Will commit shortly.

 Error handling in CSVLoader
 ---

 Key: SOLR-482
 URL: https://issues.apache.org/jira/browse/SOLR-482
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-482.patch, SOLR-482.patch


 Sometimes the underlying CSV parser can't read a line and throws an 
 exception.  Solr currently just passes the exception out to the client.  
 Wrapping this in a SolrException allows us to pass out information about what 
 line failed (which isn't always in the CSV IOException thrown).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-218) Query Parser flags clauses with explicit OR as required when followed by explicit AND.

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-218.
-

Resolution: Not A Problem
  Assignee: (was: Lucene Developers)

Note that the query has OR FIVE AND SIX and hence FIVE and SIX are required. 
Same for the last OR FOUR AND FIVE. If you want to get exact boolean 
ordering, you should use clauses.

 Query Parser flags clauses with explicit OR as required when followed by 
 explicit AND.
 --

 Key: LUCENE-218
 URL: https://issues.apache.org/jira/browse/LUCENE-218
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Affects Versions: 1.0.2
 Environment: Operating System: other
 Platform: PC
Reporter: David Mabe
Priority: Minor

 When the following string is parsed:
 ONE NOT TWO OR THREE NOT FOUR OR FIVE AND SIX SEVEN OR THRE OR FIVEE OR FOUR 
 AND FIVE SIXX
 The following query is returned:
 +ONE -TWO THREE -FOUR +FIVE +SIX SEVEN THRE FIVEE +FOUR +FIVE +SIXX
 Note that the first FIVE is required when it should not be.
 Also note that the first THREE is calculated correctly with the explicit OR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-1159) jarify target gives misleading message when svnversion doesn't exist

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-1159.
--

Resolution: Not A Problem

This seems to be fixed already. From common-build.xml:

{noformat}
!-- If possible, include the svnversion --
exec dir=. executable=${svnversion.exe}
   outputproperty=svnversion failifexecutionfails=false
arg line=./
/exec
 {noformat}

 jarify target gives misleading message when svnversion doesn't exist
 

 Key: LUCENE-1159
 URL: https://issues.apache.org/jira/browse/LUCENE-1159
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Trivial

 The jarify command in common-build.xml seems to indicate failure when it 
 can't find svnversion, but this is, in fact, just a warning.  We should check 
 to see if svnversion exists before attempting the command at all, if possible.
 The message looks something like:
  [exec] Execute failed: java.io.IOException: java.io.IOException: svnversion: 
 not found
 Which is understandable, but it is not clear what the ramifications are of 
 this missing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-547) Directory implementation for Applets

2011-01-25 Thread Andre Schild (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986490#action_12986490
 ] 

Andre Schild commented on LUCENE-547:
-

The problem with RAMDirectory is, that it uses java.io.File

When you work in a applet environment, then you don't have explicit 
java.io.File objects (for security reasons)
but instead you have to use the java.net.URL to get access to the files.

So inheriting from RAMDirectory won't do.

But you can leave it closed, as I have a working implementation, and since
noby seems to have the need for that in over 5 years

Thanks anyway for your work.

 Directory implementation for Applets
 

 Key: LUCENE-547
 URL: https://issues.apache.org/jira/browse/LUCENE-547
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: 1.9
 Environment: Applets
Reporter: Andre Schild
Priority: Minor
 Attachments: AppletDirectory.zip


 This directory implementation can be used inside of applets, where the index 
 files are located on the server.
 Also teh applet is not required to be signed, as no calls to the 
 System.getProperty are made.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-149) [PATCH] URLDirectory implementation

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-149.
-

Resolution: Not A Problem
  Assignee: (was: Lucene Developers)

This looks more like a tool to construct a Directory from a zipped file, than a 
Directory implementation. The Directory extension is just forced here - 
unzipping the files and populate a RAMDirectory will achieve the same effect. 
Anyway, idle for too many years :).

 [PATCH] URLDirectory implementation
 ---

 Key: LUCENE-149
 URL: https://issues.apache.org/jira/browse/LUCENE-149
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: URLDirectory.zip


 August 15th, 2003 contribution from Lukas Zapletal zaple...@inf.upol.cz
 Suitable for Lucene Sandbox contribution containing alternate Directory
 implementations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-150) [PATCH] DBDirectory implementation

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-150.
-

Resolution: Not A Problem
  Assignee: (was: Lucene Developers)

We have DBDirectory in contrib.

 [PATCH] DBDirectory implementation
 --

 Key: LUCENE-150
 URL: https://issues.apache.org/jira/browse/LUCENE-150
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: lucene-dbdirectory-1.0.zip


 Implementation of the Lucene Directory interface which stores data in a 
 JDBC-accessible database.
 June 2nd, 2003, a contribution from Anthony Eden m...@anthonyeden.com.
 Original email:
 Version 1.0 of the DBDirectory library, which implements a Directory
 which can store indeces in a database is now available for download.
 There are two versions:
Tar GZIP:
 http://www.anthonyeden.com/download/lucene-dbdirectory-1.0.tar.gz
ZIP: http://www.anthonyeden.com/download/lucene-dbdirectory-1.0.zip
 The source code is included.  Please read the README file for
 instructions on using DBDirectory.  I have only tested it with MySQL 
 but
 would be happy to add other database scripts if anyone would like to
 submit them.  Please post any questions here on the mailing list.
 Otis, is there anything left to do to get this into the sandbox?
 Additionally, how will I maintain the code if it is in the sandbox?
 Will I get write access to the part of the CVS repository which would
 house DBDirectory?  I currently have all of the code in my private CVS.
 Sincerely,
 Anthony Eden

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-151) [PATCH] Clonable RAMDirectory

2011-01-25 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera closed LUCENE-151.
-

Resolution: Not A Problem
  Assignee: (was: Lucene Developers)

RAMDirectory has a ctor which takes a Directory, which can be used for cloning.

 [PATCH] Clonable RAMDirectory
 -

 Key: LUCENE-151
 URL: https://issues.apache.org/jira/browse/LUCENE-151
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Affects Versions: unspecified
 Environment: Operating System: other
 Platform: Other
Reporter: Otis Gospodnetic
Priority: Minor
 Attachments: ramdir.diff, RamDirectory-clonable.patch


 A patch for RAMDirectory that makes it clonable.
 May 22nd, 2003 contribution from Nick Smith nick.sm...@techop.ch
 Original email:
 Hi Lucene Developers,
Thanks for a great product!

 I need to be able to 'snapshot' our in-memory indices (RAMDirectory
 instances).
 I have been using :
 RAMDirectory activeDir = new RAMDirectory();
 // many inserts, deletes etc
 RAMDirectory cloneDir = new RAMDirectory(activeDir);
 but unfortunately this is rather slow for large indices.
 I have a suggestion - implement java.lang.Cloneable interface
 in RAMDirectory.  I.e to be able to call :
 RAMDirectory cloneDir = (RAMDirectory)activeDir.clone();
 This bypasses the input/output stream handling of the
 copy constructor by cloneing the underlying buffers that
 form the directory and is much faster. (Diff attached).
 Any comments?
 Regards,
 Nick

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-01-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986502#action_12986502
 ] 

Yonik Seeley commented on LUCENE-2883:
--

One issue here is the different purposes for lucene and solr function queries.
Solr's function queries have always evolved at a rapid pace (and are continuing 
to evolve) to support higher level features and interfaces in Solr.  They are 
able to evolve rapidly because they are seen more as an implementation detail 
rather than interface classes, and I'd hate to lose that.  So if we do try to 
make Solr's function queries more accessible to lucene users (again), it should 
be as a Solr module.  As we can see from history and usage, function queries 
are critically important to Solr, but are obviously not to Lucene.

 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1188) equals and hashCode implementation in org.apache.lucene.search.* package

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1188.
---

   Resolution: Fixed
Fix Version/s: 2.9

The equals and hashCode implementations in Query subclasses were already fixed 
to use getClass() and not instanceof in 2.9 by various other issues. Also the 
boost comparison was mostly removed by calling super.

 equals and hashCode implementation in org.apache.lucene.search.* package
 

 Key: LUCENE-1188
 URL: https://issues.apache.org/jira/browse/LUCENE-1188
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2, 2.3, 2.3.1
 Environment: All
Reporter: Chandan Raj Rupakheti
 Fix For: 2.9

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 I would like to talk about the implementation of equals and hashCode method  
 in org.apache.lucene.search.* package. 
 Example One:
 org.apache.lucene.search.spans.SpanTermQuery (Super Class)
   - org.apache.lucene.search.payloads.BoostingTermQuery (Sub Class)
 Observation:
 * BoostingTermQuery defines equals but inherits hashCode from SpanTermQuery. 
 Definition of equals is a code clone of SpanTermQuery with a change in class 
 name. 
 Intention:
 I believe the intention of equals redefinition in BoostingTermQuery is not to 
 make the objects of SpanTermQuery and BoostingTermQuery comparable. ie. 
 spanTermQuery.equals(boostingTermQuery) == false  
 boostingTermQuery.equals(spanTermQuery) == false.
 Problem:
 With current implementation, the intention might not be respected as a result 
 of symmetric property violation of equals contract i.e.
 spanTermQuery.equals(boostingTermQuery) == true (can be)  
 boostingTermQuery.equals(spanTermQuery) == false. (always)
 (Note: Provided their state variables are equal)
 Solution:
 Change implementation of equals in SpanTermQuery from:
 {code:title=SpanTermQuery.java|borderStyle=solid}
   public boolean equals(Object o) {
 if (!(o instanceof SpanTermQuery))
   return false;
 SpanTermQuery other = (SpanTermQuery)o;
 return (this.getBoost() == other.getBoost())
this.term.equals(other.term);
   }
 {code}
 To:
 {code:title=SpanTermQuery.java|borderStyle=solid}
   public boolean equals(Object o) {
   if(o == this) return true;
   if(o == null || o.getClass() != this.getClass()) return false;
 //if (!(o instanceof SpanTermQuery))
 //  return false;
 SpanTermQuery other = (SpanTermQuery)o;
 return (this.getBoost() == other.getBoost())
this.term.equals(other.term);
   }
 {code}
 Advantage:
 * BoostingTermQuery.equals and BoostingTermQuery.hashCode is not needed while 
 still preserving the same intention as before.
  
 * Any further subclassing that does not add new state variables in the 
 extended classes of SpanTermQuery, does not have to redefine equals and 
 hashCode. 
 * Even if a new state variable is added in a subclass, the symmetric property 
 of equals contract will still be respected irrespective of implementation 
 (i.e. instanceof / getClass) of equals and hashCode in the subclasses.
 Example Two:
 org.apache.lucene.search.CachingWrapperFilter (Super Class)
   - org.apache.lucene.search.CachingWrapperFilterHelper (Sub Class)
 Observation:
 Same as Example One.
 Problem:
 Same as Example one.
 Solution:
 Change equals in CachingWrapperFilter from:
 {code:title=CachingWrapperFilter.java|borderStyle=solid}
   public boolean equals(Object o) {
 if (!(o instanceof CachingWrapperFilter)) return false;
 return this.filter.equals(((CachingWrapperFilter)o).filter);
   }
 {code}
 To:
 {code:title=CachingWrapperFilter.java|borderStyle=solid}
   public boolean equals(Object o) {
 //if (!(o instanceof CachingWrapperFilter)) return false;
 if(o == this) return true;
 if(o == null || o.getClass() != this.getClass()) return false;
 return this.filter.equals(((CachingWrapperFilter)o).filter);
   }
 {code}
 Advantage:
 Same as Example One. Here, CachingWrapperFilterHelper.equals and 
 CachingWrapperFilterHelper.hashCode is not needed.
 Example Three:
 org.apache.lucene.search.MultiTermQuery (Abstract Parent)
   - org.apache.lucene.search.FuzzyQuery (Concrete Sub)
   - org.apache.lucene.search.WildcardQuery (Concrete Sub)
 Observation (Not a problem):
 * WildcardQuery defines equals but inherits hashCode from MultiTermQuery.
 Definition of equals contains just super.equals invocation. 
 * FuzzyQuery has few state variables added that are referenced in its equals 
 and hashCode.
 Intention:
 I believe the intention here is not to make objects of FuzzyQuery and 
 WildcardQuery comparable. ie. fuzzyQuery.equals(wildCardQuery) == false  
 wildCardQuery.equals(fuzzyQuery) == false.
 Proposed 

[jira] Resolved: (SOLR-2320) ReplicationHandler doesn't return master details unless it's also configured as a slave

2011-01-25 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2320.


Resolution: Fixed

Committed revision 1063339. - trunk
Committed revision 1063343. - 3x


 ReplicationHandler doesn't return master details unless it's also configured 
 as a slave
 ---

 Key: SOLR-2320
 URL: https://issues.apache.org/jira/browse/SOLR-2320
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4, 1.4.1
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 3.1, 4.0

 Attachments: SOLR-2320.patch, SOLR-2320.patch, SOLR-2320.patch


 While investigating SOLR-2314 i found a bug which seems to be the opposite of 
 the behavior described there -- so i'm filing a seperate bug to track it.
 if ReplicationHandler is only configured as a master, command=details 
 requests won't include the master section.  that section is only output if 
 it is also configured as a slave.
 the method responsible for the details command generates the master details 
 just fine, but the code to add it to the response seems to have erroneously 
 been nested inside an if that only evaluates to true if there is a non-null 
 SnapPuller (ie: it's also a slave)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2317) Slaves have leftover index.xxxxx directories, and leftover files in index/ directory

2011-01-25 Thread Jayendra Patil (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986514#action_12986514
 ] 

Jayendra Patil commented on SOLR-2317:
--

For the extra index. you can try the patch @ 
https://issues.apache.org/jira/browse/SOLR-2156

 Slaves have leftover index.x directories, and leftover files in index/ 
 directory
 

 Key: SOLR-2317
 URL: https://issues.apache.org/jira/browse/SOLR-2317
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
Reporter: Bill Bell

 When replicating, we are getting leftover files on slaves. Some slaves are 
 getting index.number with files leftover. And more concerning, the index/ 
 direcotry has left over files from previous replicated runs.
 This is a pain to keep cleaning up.
 Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-482) Error handling in CSVLoader

2011-01-25 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-482.
--

   Resolution: Fixed
Fix Version/s: 4.0
   3.1

committed on trunk and 3.x

 Error handling in CSVLoader
 ---

 Key: SOLR-482
 URL: https://issues.apache.org/jira/browse/SOLR-482
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: SOLR-482.patch, SOLR-482.patch


 Sometimes the underlying CSV parser can't read a line and throws an 
 exception.  Solr currently just passes the exception out to the client.  
 Wrapping this in a SolrException allows us to pass out information about what 
 line failed (which isn't always in the CSV IOException thrown).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

2011-01-25 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2723:


Attachment: LUCENE-2723-BulkEnumWrapper.patch

This patch adds a BulkPostingsEnumWrapper that implement DocsEnumAndPositions 
by using the bulkpostings. I first just added this as a class to ease testsing 
for PositionDeltaBulks but it seems that this could be useful for more than 
just testing. Codecs that don't want to implement the DocsEnumAndPositions API 
can just use this wrapper to provide the functionality.
I also added a testcase for MemoryIndex that uses this wrapper

 Speed up Lucene's low level bulk postings read API
 --

 Key: LUCENE-2723
 URL: https://issues.apache.org/jira/browse/LUCENE-2723
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2723-BulkEnumWrapper.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
 LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
 LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, 
 LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, 
 LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch


 Spinoff from LUCENE-1410.
 The flex DocsEnum has a simple bulk-read API that reads the next chunk
 of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
 (from LUCENE-1410).  This is not unlike sucking coffee through those
 tiny plastic coffee stirrers they hand out airplanes that,
 surprisingly, also happen to function as a straw.
 As a result we see no perf gain from using FOR/PFOR.
 I had hacked up a fix for this, described at in my blog post at
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
 I'm opening this issue to get that work to a committable point.
 So... I've worked out a new bulk-read API to address performance
 bottleneck.  It has some big changes over the current bulk-read API:
   * You can now also bulk-read positions (but not payloads), but, I
  have yet to cutover positional queries.
   * The buffer contains doc deltas, not absolute values, for docIDs
 and positions (freqs are absolute).
   * Deleted docs are not filtered out.
   * The doc  freq buffers need not be aligned.  For fixed intblock
 codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
 Group varint, etc.) they won't be.
 It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-01-25 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986529#action_12986529
 ] 

Simon Willnauer commented on LUCENE-2883:
-

bq. One issue here is the different purposes for lucene and solr function 
queries.
Yonik, if that is your only issue then we are good to go. I don't think that 
moving stuff to modules changes anything how we develop software. 
Modularization, decoupling, interfaces etc. you know how to work with those ey? 
so hey what is really the point here, this modularization is a key point of 
merging development with lucene and everytime somebody proposes something like 
this you fear that that monolithic thing under /solr could become more modular 
and decoupled. I don't know why this is the case but we should and will move on 
with modularization. Folks will use it once its there, thats for sure. Same is 
true for faceting, replication, queryparsers, functionparser... those are on 
the list!

 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2392) Enable flexible scoring

2011-01-25 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2392:


Attachment: LUCENE-2392_take2.patch

here's a really really rough take 2 at the problem.

The general idea is to take a smaller baby-step as Mike calls it, to the 
problem.
Really we have been working our way towards this anyway, exposing additional 
statistics,
making Similarity per-field, fixing up inconsistencies... and this is the way I 
prefer, as we
get things actually committed and moving.

So whatever is in this patch (which is full of nocommits, but all tests pass 
and all queries work with it),
we could possibly then split up into other issues and continue slowly 
proceeding, or maybe
create a branch, whatever.

My problem with the other patch is it requires a ton more work to make any 
progress on it...
and things don't even compile with it, forget about tests.

The basics here are to:
# Split the matching and scoring calculations of Scorer. All responsibility 
of calculations belongs
in the Similarity, the Scorer should be matching positions, working docsEnums, 
etc etc.
# Similarity as we know it now, gets a more low-level API, and TFIDFSimilarity 
implements this API,
but exposes its customizations via the tf(), idf(), etc we know now.
# Things like score-caching and specialization of calculations are the 
responsibility of the Similarity,
as these depend upon the formula being used. For TFIDFSimilarity, i added some 
optimizations here,
for example it specializes its norms == null case away to remove the per-doc 
if.
# Since all Weights create PerReaderTermState (-- this one needs a new name), 
to separate the
seeking/stats collection from the calculations, i also optimized PhraseQuery's 
Weight/Scorer construction
to be single-pass. 

Also I like to benchmark every step of the way, so we don't come up with
this design that won't be performant: here are the scores for lucene's default 
Sim with the patch:
||Query||QPS trunk||QPS patch||Pct diff
|spanNear([unit, state], 10, true)|3.04|2.92|{color:red}-4.0%{color}|
|doctitle:.*[Uu]nited.*|4.00|3.99|{color:red}-0.1%{color}|
|+unit +state|8.11|8.12|{color:green}0.2%{color}|
|united~2.0|4.36|4.40|{color:green}1.0%{color}|
|united~1.0|18.70|18.93|{color:green}1.2%{color}|
|unit~2.0|8.54|8.71|{color:green}2.1%{color}|
|spanFirst(unit, 5)|11.35|11.59|{color:green}2.2%{color}|
|unit~1.0|8.69|8.91|{color:green}2.6%{color}|
|unit state|7.03|7.23|{color:green}2.8%{color}|
|unit state~3|3.74|3.86|{color:green}3.2%{color}|
|u*d|16.72|17.30|{color:green}3.5%{color}|
|state|19.24|20.04|{color:green}4.1%{color}|
|un*d|49.42|51.55|{color:green}4.3%{color}|
|unit state|5.99|6.31|{color:green}5.3%{color}|
|+nebraska +state|140.74|151.85|{color:green}7.9%{color}|
|uni*|10.66|11.55|{color:green}8.4%{color}|
|unit*|18.77|20.41|{color:green}8.7%{color}|
|doctimesecnum:[1 TO 6]|6.97|7.70|{color:green}10.4%{color}|

All Lucene/Solr tests pass, but there are lots of nocommits, especially
# No Javadocs
# Explains need to be fixed: in general the explanation of matching belongs 
where it is now,
but the explanation of score calculations belongs in the Similarity.
# need to refactor more out of Weight, currently we pass it to the docscorer, 
but
its the wrong object, as it can only hold a single float.

Anyway, its gonna take some time to rough all this out I'm sure, but I wanted
to show some progress/invite ideas, and also show we can do this stuff
without losing performance.

I have separate patches that need to be integrated/relevance tested e.g. 
for average doc length... maybe i'll do that next so we can get some concrete
alternate sims in here before going any further.



 Enable flexible scoring
 ---

 Key: LUCENE-2392
 URL: https://issues.apache.org/jira/browse/LUCENE-2392
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2392.patch, LUCENE-2392.patch, 
 LUCENE-2392_take2.patch


 This is a first step (nowhere near committable!), implementing the
 design iterated to in the recent Baby steps towards making Lucene's
 scoring more flexible java-dev thread.
 The idea is (if you turn it on for your Field; it's off by default) to
 store full stats in the index, into a new _X.sts file, per doc (X
 field) in the index.
 And then have FieldSimilarityProvider impls that compute doc's boost
 bytes (norms) from these stats.
 The patch is able to index the stats, merge them when segments are
 merged, and provides an iterator-only API.  It also has starting point
 for per-field Sims that use the stats iterator API to compute boost
 bytes.  But it's not at all tied into actual searching!  There's still
 tons left to do, eg, how does 

[jira] Commented: (LUCENE-403) Alternate Lucene Query Highlighter

2011-01-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986542#action_12986542
 ] 

Uwe Schindler commented on LUCENE-403:
--

Mark Miller: What do you think, is this issue still relevant?

If not, we should close it and say: resolved by FastVectorHighlighter or 
because recent improvements in standard highlighter?

 Alternate Lucene Query Highlighter
 --

 Key: LUCENE-403
 URL: https://issues.apache.org/jira/browse/LUCENE-403
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 1.4
 Environment: Operating System: All
 Platform: All
Reporter: David Bohl
Priority: Minor
 Attachments: HighlighterTest.java, HighlighterTest.java, 
 QueryHighlighter.java, QueryHighlighter.java, QueryHighlighter.java, 
 QuerySpansExtractor.java


 I created a lucene query highlighter (borrowing some code from the one in
 the sandbox) that my company is using.  It better handles phrase queries,
 doesn't break HTML entities, and has the ability to either highlight terms
 in an entire document or to highlight fragments from the document.  I would 
 like to make it available to anyone who wants it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-403) Alternate Lucene Query Highlighter

2011-01-25 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986544#action_12986544
 ] 

Mark Miller commented on LUCENE-403:


Yeah - I would totally close this. This work has been superseded - and it looks 
like highlighting may be able to take another leap forward soon. 

 Alternate Lucene Query Highlighter
 --

 Key: LUCENE-403
 URL: https://issues.apache.org/jira/browse/LUCENE-403
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 1.4
 Environment: Operating System: All
 Platform: All
Reporter: David Bohl
Priority: Minor
 Attachments: HighlighterTest.java, HighlighterTest.java, 
 QueryHighlighter.java, QueryHighlighter.java, QueryHighlighter.java, 
 QuerySpansExtractor.java


 I created a lucene query highlighter (borrowing some code from the one in
 the sandbox) that my company is using.  It better handles phrase queries,
 doesn't break HTML entities, and has the ability to either highlight terms
 in an entire document or to highlight fragments from the document.  I would 
 like to make it available to anyone who wants it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-403) Alternate Lucene Query Highlighter

2011-01-25 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved LUCENE-403.


Resolution: Won't Fix
  Assignee: Mark Miller

Some of this work moved into other issues. Some of it just too old now. I think 
this issue has served it's purpose.

 Alternate Lucene Query Highlighter
 --

 Key: LUCENE-403
 URL: https://issues.apache.org/jira/browse/LUCENE-403
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 1.4
 Environment: Operating System: All
 Platform: All
Reporter: David Bohl
Assignee: Mark Miller
Priority: Minor
 Attachments: HighlighterTest.java, HighlighterTest.java, 
 QueryHighlighter.java, QueryHighlighter.java, QueryHighlighter.java, 
 QuerySpansExtractor.java


 I created a lucene query highlighter (borrowing some code from the one in
 the sandbox) that my company is using.  It better handles phrase queries,
 doesn't break HTML entities, and has the ability to either highlight terms
 in an entire document or to highlight fragments from the document.  I would 
 like to make it available to anyone who wants it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-403) Alternate Lucene Query Highlighter

2011-01-25 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller closed LUCENE-403.
--


 Alternate Lucene Query Highlighter
 --

 Key: LUCENE-403
 URL: https://issues.apache.org/jira/browse/LUCENE-403
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 1.4
 Environment: Operating System: All
 Platform: All
Reporter: David Bohl
Assignee: Mark Miller
Priority: Minor
 Attachments: HighlighterTest.java, HighlighterTest.java, 
 QueryHighlighter.java, QueryHighlighter.java, QueryHighlighter.java, 
 QuerySpansExtractor.java


 I created a lucene query highlighter (borrowing some code from the one in
 the sandbox) that my company is using.  It better handles phrase queries,
 doesn't break HTML entities, and has the ability to either highlight terms
 in an entire document or to highlight fragments from the document.  I would 
 like to make it available to anyone who wants it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-990) ParallelMultiSearcher.search with a custom HitCollector should run parallel

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler closed LUCENE-990.


Resolution: Won't Fix

ParallelMultiSearcher was dropped with MultiSearcher in Lucene trunk (because 
of too mayn unsolveable scoring and deMorgan bugs). The replacement is a 
parallelized IndexSearcher on MultiReaders.

It's not possible to solve this even for the new one, as it would need 
Collector to be synchronized.

 ParallelMultiSearcher.search with a custom HitCollector should run parallel
 ---

 Key: LUCENE-990
 URL: https://issues.apache.org/jira/browse/LUCENE-990
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2, 2.3
Reporter: Jan-Pascal
Priority: Minor

 The ParallelMultiSearcher.search(Weight weight, Filter filter, final 
 HitCollector results) should search over its underlying Searchers in 
 parallel, like the TopDocs versions of the search() method. There's a @todo 
 for this in the method's Javadoc comment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1264) Use of IOException in analysis component method signatures leads to poor error management

2011-01-25 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-1264.
---

Resolution: Won't Fix

This issue is quite old and no response was given to Hoss' comment. In general 
this is not an issue, as you can also always throw RuntimeExceptions. 
IOException is only listed in throws there because it is unfortunately checked 
and needed by Tokenizer as it works on java.io.Reader.

 Use of IOException in analysis component method signatures leads to poor 
 error management
 -

 Key: LUCENE-1264
 URL: https://issues.apache.org/jira/browse/LUCENE-1264
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/analyzers
Affects Versions: 2.3.1
Reporter: Benson Margulies

 Methods such as 'next' and 'reset' are defined to throw only IOException.
 IOException, as one of the older and dustier Java exceptions, lacks a 
 constructor over a 'cause' exception.
 So, if a Tokenizer (for example) uses some complex underlying facility that 
 throws arbitrary exceptions, the coder has two bad choices: wrap an 
 IOException around some string derived from the real problem, or throw an 
 unchecked wrapper.
 Please consider adding a new checked exception to the signature of these 
 methods  that implements the 'cause' pattern.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[no subject]

2011-01-25 Thread Dongxu Wang



[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

2011-01-25 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986556#action_12986556
 ] 

Jason Rutherglen commented on LUCENE-2324:
--

The compilation errors are gone, TestNRTThreads and TestStressIndexing2 are 
still failing.  I think we need to implement Mike's idea: 

https://issues.apache.org/jira/browse/LUCENE-2324?focusedCommentId=12984285page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12984285

then retest.  

Is a test deadlocking somewhere, ant hasn't returned.

 Per thread DocumentsWriters that write their own private segments
 -

 Key: LUCENE-2324
 URL: https://issues.apache.org/jira/browse/LUCENE-2324
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, 
 LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch, lucene-2324.patch, 
 lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out


 See LUCENE-2293 for motivation and more details.
 I'm copying here Mike's summary he posted on 2293:
 Change the approach for how we buffer in RAM to a more isolated
 approach, whereby IW has N fully independent RAM segments
 in-process and when a doc needs to be indexed it's added to one of
 them. Each segment would also write its own doc stores and
 normal segment merging (not the inefficient merge we now do on
 flush) would merge them. This should be a good simplification in
 the chain (eg maybe we can remove the *PerThread classes). The
 segments can flush independently, letting us make much better
 concurrent use of IO  CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-01-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986558#action_12986558
 ] 

Michael McCandless commented on LUCENE-2883:


Can't we consolidate them under a new toplevel module?  modules/queries?

We can mark the classes as lucene.experimental?  Then we are free to iterate 
quickly.  Does that address your concern Yonik?


 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-01-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986561#action_12986561
 ] 

Yonik Seeley commented on LUCENE-2883:
--

Not sure if I communicated the issue clearly: taking what is essentially 
implementation and trying to make it interface clearly has a cost.
Function queries and the solr qparser architecture are constantly evolving, and 
wind all through solr.

If we attempt to make this easier to use by lucene users by moving it out to a 
module then:
 - it should be a solr module... keep the solr package names and make it clear 
that it's primary purpose is supporting higher level features in solr
 - we should make it such that java interface back compatibility is not a 
requirement, even for point releases

The other approach is to make a Lucene function query module (actually, we 
already have that), try to update it with stuff from solr, but make it's 
primary purpose to support the Java interfaces.

 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs

2011-01-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986566#action_12986566
 ] 

Michael McCandless commented on LUCENE-2666:


Hmmm --- given that exception, I would expect CheckIndex to have also seen this 
issue.

Searching at the same time as indexing shouldn't cause this.  Lucene doesn't 
cache postings, but does cache metadata for the term, though I can't see how 
that could lead to this exception.

This could also be a hardware issue?  Do you see the problem on more than one 
machine?

 ArrayIndexOutOfBoundsException when iterating over TermDocs
 ---

 Key: LUCENE-2666
 URL: https://issues.apache.org/jira/browse/LUCENE-2666
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.2
Reporter: Shay Banon
 Attachments: checkindex-out.txt


 A user got this very strange exception, and I managed to get the index that 
 it happens on. Basically, iterating over the TermDocs causes an AAOIB 
 exception. I easily reproduced it using the FieldCache which does exactly 
 that (the field in question is indexed as numeric). Here is the exception:
 Exception in thread main java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
   at 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
   at 
 org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
   at TestMe.main(TestMe.java:56)
 It happens on the following segment: _26t docCount: 914 delCount: 1 
 delFileName: _26t_1.del
 And as you can see, it smells like a corner case (it fails for document 
 number 912, the AIOOB happens from the deleted docs). The code to recreate it 
 is simple:
 FSDirectory dir = FSDirectory.open(new File(index));
 IndexReader reader = IndexReader.open(dir, true);
 IndexReader[] subReaders = reader.getSequentialSubReaders();
 for (IndexReader subReader : subReaders) {
 Field field = 
 subReader.getClass().getSuperclass().getDeclaredField(si);
 field.setAccessible(true);
 SegmentInfo si = (SegmentInfo) field.get(subReader);
 System.out.println(--  + si);
 if (si.getDocStoreSegment().contains(_26t)) {
 // this is the probleatic one...
 System.out.println(problematic one...);
 FieldCache.DEFAULT.getLongs(subReader, __documentdate, 
 FieldCache.NUMERIC_UTILS_LONG_PARSER);
 }
 }
 Here is the result of a check index on that segment:
   8 of 10: name=_26t docCount=914
 compound=true
 hasProx=true
 numFiles=2
 size (MB)=1.641
 diagnostics = {optimize=false, mergeFactor=10, 
 os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
 lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, 
 os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
 has deletions [delFileName=_26t_1.del]
 test: open reader.OK [1 deleted docs]
 test: fields..OK [32 fields]
 test: field norms.OK [32 fields]
 test: terms, freq, prox...ERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
   at 
 org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
   at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 test: stored fields...ERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
   at 
 org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
   at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
   at TestMe.main(TestMe.java:47)
 test: term vectorsERROR [114]
 java.lang.ArrayIndexOutOfBoundsException: 114
   at org.apache.lucene.util.BitVector.get(BitVector.java:104)
   at 
 org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
   at 
 

[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

2011-01-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986568#action_12986568
 ] 

Robert Muir commented on LUCENE-2723:
-

Simon, just took a quick glance (not a serious review, all the bulkpostings 
stuff is heavy).

I agree with the idea that Codecs should only need to implement the bulk api at 
a minimum:
if all serious stuff (queries) is using these bulk apis, then the friendly 
iterator methods
can simply be a wrapper over it.

but separately, i know there are some performance degradations with the bulk 
APIs today
versus trunk... (with the same index). I know if i use other fixed-int codecs i 
see these same
problems, so I dont think its just Standard's implementation: pretty sure the 
issue is somewhere
with advance()/jump().

I really wish we could debug whatever this performance problem is, just in case 
the bulk APIs
themselves need changing... a little concerned about them at the moment thats 
all...
not sure it should stand in the way of your patch, just saying I don't like the 
performance
regression.



 Speed up Lucene's low level bulk postings read API
 --

 Key: LUCENE-2723
 URL: https://issues.apache.org/jira/browse/LUCENE-2723
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2723-BulkEnumWrapper.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, 
 LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
 LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
 LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, 
 LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, 
 LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch


 Spinoff from LUCENE-1410.
 The flex DocsEnum has a simple bulk-read API that reads the next chunk
 of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
 (from LUCENE-1410).  This is not unlike sucking coffee through those
 tiny plastic coffee stirrers they hand out airplanes that,
 surprisingly, also happen to function as a straw.
 As a result we see no perf gain from using FOR/PFOR.
 I had hacked up a fix for this, described at in my blog post at
 http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
 I'm opening this issue to get that work to a committable point.
 So... I've worked out a new bulk-read API to address performance
 bottleneck.  It has some big changes over the current bulk-read API:
   * You can now also bulk-read positions (but not payloads), but, I
  have yet to cutover positional queries.
   * The buffer contains doc deltas, not absolute values, for docIDs
 and positions (freqs are absolute).
   * Deleted docs are not filtered out.
   * The doc  freq buffers need not be aligned.  For fixed intblock
 codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
 Group varint, etc.) they won't be.
 It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-01-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986569#action_12986569
 ] 

Robert Muir commented on LUCENE-2883:
-

Wait, why again did we merge lucene and solr? This is crazy-talk.

I don't see a single valid reason why queries should be in solr-only.


 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules

2011-01-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986571#action_12986571
 ] 

Yonik Seeley commented on LUCENE-2883:
--

bq. We can mark the classes as lucene.experimental?

If they remain experimental I suppose, but lucene.internal would be a more 
accurate description.

 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-25 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986581#action_12986581
 ] 

Grant Ingersoll commented on SOLR-445:
--

This patch looks pretty reasonable from the details of the implementation, but 
I don't think it's quite ready for commit yet.

First, we should be able to extend this to all that implement 
ContentStreamLoader (JSONLoader, CSVLoader) if they want it (it doesn't make 
sense for the SolrCell stuff).  

As I see it, we can do this by putting some base functionality into 
ContentStreamLoader which does what is done in this patch.
I think we need two methods, one that handles the immediate error (takes in a 
StringBuilder and the info about the doc that failed) and decides whether to 
abort or buffer the error for later reporting depending on the configuration 
setting.  

I don't think the configuration of the item belongs in the UpdateHandler.  Erik 
H. meant that it goes in the configuration of the /update RequestHandler in the 
config, not the DirectUpdateHandler2, as in 
{code}requestHandler name=/update class=solr.XmlUpdateRequestHandler 
/{code}

This config could be a request param just like any other (such that one could 
even say they want to override it via a request via the defaults, appends, 
invariants).

Also, I know it is tempting to do so, but please don't reformat the code in the 
patch.  It slows down review significantly.  In general, I try to reformat 
right before committing as do most committers.

 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Grant Ingersoll
 Fix For: Next

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, solr-445.xml, SOLR-445_3x.patch


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-445) XmlUpdateRequestHandler bad documents mid batch aborts rest of batch

2011-01-25 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986584#action_12986584
 ] 

Grant Ingersoll commented on SOLR-445:
--

Oh, one other thing.  You don't need to produce a 3.x patch.  We can just do an 
SVN merge.

 XmlUpdateRequestHandler bad documents mid batch aborts rest of batch
 

 Key: SOLR-445
 URL: https://issues.apache.org/jira/browse/SOLR-445
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.3
Reporter: Will Johnson
Assignee: Grant Ingersoll
 Fix For: Next

 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, 
 SOLR-445.patch, solr-445.xml, SOLR-445_3x.patch


 Has anyone run into the problem of handling bad documents / failures mid 
 batch.  Ie:
 add
   doc
 field name=id1/field
   /doc
   doc
 field name=id2/field
 field name=myDateFieldI_AM_A_BAD_DATE/field
   /doc
   doc
 field name=id3/field
   /doc
 /add
 Right now solr adds the first doc and then aborts.  It would seem like it 
 should either fail the entire batch or log a message/return a code and then 
 continue on to add doc 3.  Option 1 would seem to be much harder to 
 accomplish and possibly require more memory while Option 2 would require more 
 information to come back from the API.  I'm about to dig into this but I 
 thought I'd ask to see if anyone had any suggestions, thoughts or comments.   
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2171) Using stats feature over a function, Function returning as a field value

2011-01-25 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-2171.
---

Resolution: Duplicate

See SOLR-1298

 Using stats feature over a function, Function returning as a field value
 

 Key: SOLR-2171
 URL: https://issues.apache.org/jira/browse/SOLR-2171
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis, search
 Environment: All
Reporter: Tanguy Moal
Priority: Minor

 In order to be able to take big advantage of the stats component, it would be 
 great to be able to define a function as a field.
 Returning the result of a function as a virtual field for each document for 
 example, would enable us to have a much more advanced use of the stats 
 component.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2010) Remove segments with all documents deleted in commit/flush/close of IndexWriter instead of waiting until a merge occurs.

2011-01-25 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2010.


Resolution: Fixed

 Remove segments with all documents deleted in commit/flush/close of 
 IndexWriter instead of waiting until a merge occurs.
 

 Key: LUCENE-2010
 URL: https://issues.apache.org/jira/browse/LUCENE-2010
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Michael McCandless
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2010.patch


 I do not know if this is a bug in 2.9.0, but it seems that segments with all 
 documents deleted are not automatically removed:
 {noformat}
 4 of 14: name=_dlo docCount=5
   compound=true
   hasProx=true
   numFiles=2
   size (MB)=0.059
   diagnostics = {java.version=1.5.0_21, lucene.version=2.9.0 817268P - 
 2009-09-21 10:25:09, os=SunOS,
  os.arch=amd64, java.vendor=Sun Microsystems Inc., os.version=5.10, 
 source=flush}
   has deletions [delFileName=_dlo_1.del]
   test: open reader.OK [5 deleted docs]
   test: fields..OK [136 fields]
   test: field norms.OK [136 fields]
   test: terms, freq, prox...OK [1698 terms; 4236 terms/docs pairs; 0 tokens]
   test: stored fields...OK [0 total field count; avg ? fields per doc]
   test: term vectorsOK [0 total vector count; avg ? term/freq vector 
 fields per doc]
 {noformat}
 Shouldn't such segments not be removed automatically during the next 
 commit/close of IndexWriter?
 *Mike McCandless:*
 Lucene doesn't actually short-circuit this case, ie, if every single doc in a 
 given segment has been deleted, it will still merge it [away] like normal, 
 rather than simply dropping it immediately from the index, which I agree 
 would be a simple optimization. Can you open a new issue? I would think IW 
 can drop such a segment immediately (ie not wait for a merge or optimize) on 
 flushing new deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2177) Add More Facet demonstrations to the /browse example

2011-01-25 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-2177.
---

Resolution: Fixed

 Add More Facet demonstrations to the /browse example
 

 Key: SOLR-2177
 URL: https://issues.apache.org/jira/browse/SOLR-2177
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Trivial
 Attachments: SOLR-2177.patch, SOLR-2177.patch


 Demonstrate other faceting techniques in the /browse example: range, date, 
 pivot, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Closed: (LUCENE-805) New Lucene Demo

2011-01-25 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll closed LUCENE-805.
--

Resolution: Won't Fix

 New Lucene Demo
 ---

 Key: LUCENE-805
 URL: https://issues.apache.org/jira/browse/LUCENE-805
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Examples
Reporter: Grant Ingersoll
Priority: Minor

 The much maligned demo, while useful, could use a breath of fresh air.  This 
 issue is to start collecting requirements about what people would like to see 
 in a demo and what they don't like in the current one.
 Ideas (not necessarily in order of importance):
 1. More in-depth tutorial explaining indexing/searching
 2. Multilingual support/demonstration
 3. Better demonstration of querying capabilities: Spans, Phrases, Wildcards, 
 Filters, sorting, etc.
 4. Dealing with different content types and pointers to resources
 5. Wiki use cases links -- I think it would be cool to solicit people to 
 contribute use cases to the docs. 
 6. Demonstration of contrib packages, esp. Highlighter
 7. Performance issues/factors/tradeoffs.  Lucene lessons learned and best 
 practices
 Advanced tutorials:
 1. Hadoop + Lucene
 2. Writing custom analyzers/filters/tokenizers
 3. Changing Scoring
 4. Payloads (when they are committed)
 Please contribute what else you would like to see.  I may be able to address 
 some of these issues for my ApacheCon talk, but not all of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Threading of JIRA e-mails in gmail?

2011-01-25 Thread Dawid Weiss
Hi everyone,

There's a fair bit of info on the internet about this, apparently
gmail groups by subject only and JIRA includes varying content in an
issue's subject, depending on the action (comment, update, etc.). Did
anybody find a solution to thread ALL of an issue's messages into a
single thread (other than hacking through a proxy account and
rewriting message subjects? :)

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Threading of JIRA e-mails in gmail?

2011-01-25 Thread Michael McCandless
This is an awful problem!

I made a Python script to workaround this... it's kinda scary: it logs
in (over IMAP), finds the messages, removes the old ones, and puts
back new ones with the corrected subject line so that gmail groups
them properly.  If you want I can send the Python script... but it's
pretty scary.  If it has bugs it can delete your emails!  And it
requires you to put your IMAP credentials into a Python source... etc.

I wish there were a cleaner solution :)

Mike

On Tue, Jan 25, 2011 at 2:14 PM, Dawid Weiss dawid.we...@gmail.com wrote:
 Hi everyone,

 There's a fair bit of info on the internet about this, apparently
 gmail groups by subject only and JIRA includes varying content in an
 issue's subject, depending on the action (comment, update, etc.). Did
 anybody find a solution to thread ALL of an issue's messages into a
 single thread (other than hacking through a proxy account and
 rewriting message subjects? :)

 Dawid

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1574) PooledSegmentReader, pools SegmentReader underlying byte arrays

2011-01-25 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12986630#action_12986630
 ] 

Michael McCandless commented on LUCENE-1574:


I've been testing on a 25M doc index (all of en Wikipedia, at least as of March 
2010).

Yes, I think likely alloc of big BitVector, System.arraycopy, destroying it, 
may be a fairly low cost compared to lucene resolving the deleted term, 
indexing the doc, flushing the tiny segment, etc.

 PooledSegmentReader, pools SegmentReader underlying byte arrays
 ---

 Key: LUCENE-1574
 URL: https://issues.apache.org/jira/browse/LUCENE-1574
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-1574.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 PooledSegmentReader pools the underlying byte arrays of deleted docs and 
 norms for realtime search.  It is designed for use with IndexReader.clone 
 which can create many copies of byte arrays, which are of the same length for 
 a given segment.  When pooled they can be reused which could save on memory.  
 Do we want to benchmark the memory usage comparison of PooledSegmentReader vs 
 GC?  Many times GC is enough for these smaller objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >