[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for sub reader

2009-08-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744970#action_12744970
 ] 

Michael McCandless commented on LUCENE-1821:



BTW contrib/spatial has exactly this same problem.  It currently
builds up a cache, keyed on the top (MultiReader's) docID, of the
precise distance computed by its precise distance filters, to then be
used during sorting.  Right now it simply computes its own docBase and
increments it every time getDocIdSet() is called (which is messy).
Though I think it could (and should) switch to a per-segment cache.

I am torn.  On the one hand we don't want to encourage apps to be
using top docIDs anywhere down low (eg Weight/Scorer).  We'd like
all such per-segment swtiching to happen up high.

But on the other hand, this is quite a sudden change, and most
advanced apps will be using the top docIDs by definition (since
per-segment docIDs only becomes an [easy] option in 2.9), so it'd be
more friendly to offer up a cleaner migration path for such apps where
Weight/Scorer is told its docBase.

And, having to migrate an ord index from top to sub docIDs is
truly a nightmare, having gone through that with Mark in getting
String sorting to work per segment!


 Weight.scorer() not passed doc offset for sub reader
 --

 Key: LUCENE-1821
 URL: https://issues.apache.org/jira/browse/LUCENE-1821
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith

 Now that searching is done on a per segment basis, there is no way for a 
 Scorer to know the actual doc id for the document's it matches (only the 
 relative doc offset into the segment)
 If using caches in your scorer that are based on the entire index (all 
 segments), there is now no way to index into them properly from inside a 
 Scorer because the scorer is not passed the needed offset to calculate the 
 real docid
 suggest having Weight.scorer() method also take a integer for the doc offset
 Abstract Weight class should have a constructor that takes this offset as 
 well as a method to get the offset
 All Weights that have sub weights must pass this offset down to created 
 sub weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers

2009-08-19 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-1794.
-

Resolution: Fixed

Committed revision 805766.

 implement reusableTokenStream for all contrib analyzers
 ---

 Key: LUCENE-1794
 URL: https://issues.apache.org/jira/browse/LUCENE-1794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1794-reusing-analyzer.patch, LUCENE-1794.patch, 
 LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, 
 LUCENE-1794.patch, LUCENE-1794_fix.patch, LUCENE-1794_fix2.txt


 most contrib analyzers do not have an impl for reusableTokenStream
 regardless of how expensive the back compat reflection is for indexing speed, 
 I think we should do this to mitigate any performance costs. hey, overall it 
 might even be an improvement!
 the back compat code for non-final analyzers is already in place so this is 
 easy money in my opinion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1813) Add option to ReverseStringFilter to mark reversed tokens

2009-08-19 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-1813.
-

Resolution: Fixed

Committed revision 805769.

Thanks Andrzej and also everyone who provided feedback


 Add option to ReverseStringFilter to mark reversed tokens
 -

 Key: LUCENE-1813
 URL: https://issues.apache.org/jira/browse/LUCENE-1813
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Affects Versions: 2.9
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 2.9

 Attachments: LUCENE-1813.patch, LUCENE-1813.patch, LUCENE-1813.patch, 
 reverseMark-2.patch, reverseMark.patch


 This patch implements additional functionality in the filter to mark 
 reversed tokens with a special marker character (Unicode 0001). This is 
 useful when indexing both straight and reversed tokens (e.g. to implement 
 efficient leading wildcards search).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for sub reader

2009-08-19 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745036#action_12745036
 ] 

Tim Smith commented on LUCENE-1821:
---

Concerning the changelog, i feel the below should be added to the Changes in 
runtime behavior section (it's kinda specified in New features, however
it is also a rather substantial change in the runtime behavior and should be 
called out explicitly there)

{code}
 13. LUCENE-1483: When searching over multiple segments, a new Scorer is 
created for each segment. 
The Weight is created only once for the top level searcher. Each Scorer 
is passed the per-segment IndexReader.
This will result in docids in the Scorer being internal to the 
per-segment IndexReader and there is currently no way
 to rebase these docids to the top level IndexReader. This results in 
any caches/filters that use docids over the top 
 IndexReader to be broken.
{code}


 Weight.scorer() not passed doc offset for sub reader
 --

 Key: LUCENE-1821
 URL: https://issues.apache.org/jira/browse/LUCENE-1821
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith

 Now that searching is done on a per segment basis, there is no way for a 
 Scorer to know the actual doc id for the document's it matches (only the 
 relative doc offset into the segment)
 If using caches in your scorer that are based on the entire index (all 
 segments), there is now no way to index into them properly from inside a 
 Scorer because the scorer is not passed the needed offset to calculate the 
 real docid
 suggest having Weight.scorer() method also take a integer for the doc offset
 Abstract Weight class should have a constructor that takes this offset as 
 well as a method to get the offset
 All Weights that have sub weights must pass this offset down to created 
 sub weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for sub reader

2009-08-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745039#action_12745039
 ] 

Mark Miller commented on LUCENE-1821:
-

I think thats a good idea. I think that last sentence needs a bit of work. Here 
is another attempt that I am still not quite happy with:

{code}13. LUCENE-1483: When searching over multiple segments, a new Scorer is 
created for each segment. 
The Weight is created only once for the top level searcher. Each Scorer 
is passed the per-segment IndexReader.
This will result in docids in the Scorer being internal to the 
per-segment IndexReader and there is currently no way
 to rebase these docids to the top level IndexReader. This will likely 
break any caches/filters in Scorers that rely on docids from the top 
 level IndexReader eg if you rely on the IndexReader to contain every 
doc id in the index.{code}

 Weight.scorer() not passed doc offset for sub reader
 --

 Key: LUCENE-1821
 URL: https://issues.apache.org/jira/browse/LUCENE-1821
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith

 Now that searching is done on a per segment basis, there is no way for a 
 Scorer to know the actual doc id for the document's it matches (only the 
 relative doc offset into the segment)
 If using caches in your scorer that are based on the entire index (all 
 segments), there is now no way to index into them properly from inside a 
 Scorer because the scorer is not passed the needed offset to calculate the 
 real docid
 suggest having Weight.scorer() method also take a integer for the doc offset
 Abstract Weight class should have a constructor that takes this offset as 
 well as a method to get the offset
 All Weights that have sub weights must pass this offset down to created 
 sub weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2009-08-19 Thread Alex Vigdor (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vigdor updated LUCENE-1824:


Attachment: LUCENE-1824-test.patch

 FastVectorHighlighter truncates words at beginning and end of fragments
 ---

 Key: LUCENE-1824
 URL: https://issues.apache.org/jira/browse/LUCENE-1824
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
 Environment: any
Reporter: Alex Vigdor
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch


 FastVectorHighlighter does not take word boundaries into consideration when 
 building fragments, so that in most cases the first and last word of a 
 fragment are truncated.  This makes the highlights less legible than they 
 should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
 by expanding the start and end boundaries of the fragment to the first 
 whitespace character on either side of the fragment, or the beginning or end 
 of the source text, whichever comes first.  This significantly improves 
 legibility, at the cost of returning a slightly larger number of characters 
 than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2009-08-19 Thread Alex Vigdor (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vigdor updated LUCENE-1824:


Attachment: (was: LUCENE-1824-test.patch)

 FastVectorHighlighter truncates words at beginning and end of fragments
 ---

 Key: LUCENE-1824
 URL: https://issues.apache.org/jira/browse/LUCENE-1824
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
 Environment: any
Reporter: Alex Vigdor
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-1824-test.patch, LUCENE-1824.patch


 FastVectorHighlighter does not take word boundaries into consideration when 
 building fragments, so that in most cases the first and last word of a 
 fragment are truncated.  This makes the highlights less legible than they 
 should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
 by expanding the start and end boundaries of the fragment to the first 
 whitespace character on either side of the fragment, or the beginning or end 
 of the source text, whichever comes first.  This significantly improves 
 legibility, at the cost of returning a slightly larger number of characters 
 than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for sub reader

2009-08-19 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745041#action_12745041
 ] 

Tim Smith commented on LUCENE-1821:
---

One more pass

{code}
13. LUCENE-1483: When searching over multiple segments, a new Scorer is created 
for each segment. 
The Weight is created only once for the top level searcher. Each Scorer 
is passed the per-segment IndexReader.
This will result in docids in the Scorer being internal to the 
per-segment IndexReader. If a custom Scorer implementation 
 uses any caches/filters based on the top level IndexReader/Searcher, 
it will need to be updated to use caches/filters on a 
 per segment basis. There is currently no way provided to rebase the 
docids in the Scorer to the top level IndexReader.
 See LUCENE-1821 for discussion on workarounds for this.
{code}

 Weight.scorer() not passed doc offset for sub reader
 --

 Key: LUCENE-1821
 URL: https://issues.apache.org/jira/browse/LUCENE-1821
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith

 Now that searching is done on a per segment basis, there is no way for a 
 Scorer to know the actual doc id for the document's it matches (only the 
 relative doc offset into the segment)
 If using caches in your scorer that are based on the entire index (all 
 segments), there is now no way to index into them properly from inside a 
 Scorer because the scorer is not passed the needed offset to calculate the 
 real docid
 suggest having Weight.scorer() method also take a integer for the doc offset
 Abstract Weight class should have a constructor that takes this offset as 
 well as a method to get the offset
 All Weights that have sub weights must pass this offset down to created 
 sub weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1821) Weight.scorer() not passed doc offset for sub reader

2009-08-19 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1821:
--

Attachment: LUCENE-1821.patch

Here's a patch that adds getIndexReaderBase(IndexReader reader) to IndexSearcher

sadly, this cannot be easily added to MultiSearcher as well as it uses 
Searchables, which would require adding this method to the Searchable 
interface
I could work up another patch that adds this method to the Searchable 
interface, however that has some back-compat concerns


 Weight.scorer() not passed doc offset for sub reader
 --

 Key: LUCENE-1821
 URL: https://issues.apache.org/jira/browse/LUCENE-1821
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith
 Attachments: LUCENE-1821.patch


 Now that searching is done on a per segment basis, there is no way for a 
 Scorer to know the actual doc id for the document's it matches (only the 
 relative doc offset into the segment)
 If using caches in your scorer that are based on the entire index (all 
 segments), there is now no way to index into them properly from inside a 
 Scorer because the scorer is not passed the needed offset to calculate the 
 real docid
 suggest having Weight.scorer() method also take a integer for the doc offset
 Abstract Weight class should have a constructor that takes this offset as 
 well as a method to get the offset
 All Weights that have sub weights must pass this offset down to created 
 sub weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for sub reader

2009-08-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745044#action_12745044
 ] 

Mark Miller commented on LUCENE-1821:
-

Looks great!

I still almost want to say rely on though:

bq. uses any caches/filters based on the top level IndexReader/Searcher

bq. uses any caches/filters that rely on being based on the top level 
IndexReader/Searcher

No? It seems like you could be based on a top level reader before, but not rely 
on the fact that it was a top level ...

 Weight.scorer() not passed doc offset for sub reader
 --

 Key: LUCENE-1821
 URL: https://issues.apache.org/jira/browse/LUCENE-1821
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith
 Attachments: LUCENE-1821.patch


 Now that searching is done on a per segment basis, there is no way for a 
 Scorer to know the actual doc id for the document's it matches (only the 
 relative doc offset into the segment)
 If using caches in your scorer that are based on the entire index (all 
 segments), there is now no way to index into them properly from inside a 
 Scorer because the scorer is not passed the needed offset to calculate the 
 real docid
 suggest having Weight.scorer() method also take a integer for the doc offset
 Abstract Weight class should have a constructor that takes this offset as 
 well as a method to get the offset
 All Weights that have sub weights must pass this offset down to created 
 sub weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for sub reader

2009-08-19 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745045#action_12745045
 ] 

Tim Smith commented on LUCENE-1821:
---

rely on it is

 Weight.scorer() not passed doc offset for sub reader
 --

 Key: LUCENE-1821
 URL: https://issues.apache.org/jira/browse/LUCENE-1821
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith
 Attachments: LUCENE-1821.patch


 Now that searching is done on a per segment basis, there is no way for a 
 Scorer to know the actual doc id for the document's it matches (only the 
 relative doc offset into the segment)
 If using caches in your scorer that are based on the entire index (all 
 segments), there is now no way to index into them properly from inside a 
 Scorer because the scorer is not passed the needed offset to calculate the 
 real docid
 suggest having Weight.scorer() method also take a integer for the doc offset
 Abstract Weight class should have a constructor that takes this offset as 
 well as a method to get the offset
 All Weights that have sub weights must pass this offset down to created 
 sub weights

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2009-08-19 Thread Alex Vigdor (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vigdor updated LUCENE-1824:


Attachment: (was: LUCENE-1824.patch)

 FastVectorHighlighter truncates words at beginning and end of fragments
 ---

 Key: LUCENE-1824
 URL: https://issues.apache.org/jira/browse/LUCENE-1824
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
 Environment: any
Reporter: Alex Vigdor
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-1824.patch


 FastVectorHighlighter does not take word boundaries into consideration when 
 building fragments, so that in most cases the first and last word of a 
 fragment are truncated.  This makes the highlights less legible than they 
 should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
 by expanding the start and end boundaries of the fragment to the first 
 whitespace character on either side of the fragment, or the beginning or end 
 of the source text, whichever comes first.  This significantly improves 
 legibility, at the cost of returning a slightly larger number of characters 
 than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2009-08-19 Thread Alex Vigdor (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Vigdor updated LUCENE-1824:


Attachment: (was: LUCENE-1824-test.patch)

 FastVectorHighlighter truncates words at beginning and end of fragments
 ---

 Key: LUCENE-1824
 URL: https://issues.apache.org/jira/browse/LUCENE-1824
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
 Environment: any
Reporter: Alex Vigdor
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-1824.patch


 FastVectorHighlighter does not take word boundaries into consideration when 
 building fragments, so that in most cases the first and last word of a 
 fragment are truncated.  This makes the highlights less legible than they 
 should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
 by expanding the start and end boundaries of the fragment to the first 
 whitespace character on either side of the fragment, or the beginning or end 
 of the source text, whichever comes first.  This significantly improves 
 legibility, at the cost of returning a slightly larger number of characters 
 than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1824) FastVectorHighlighter truncates words at beginning and end of fragments

2009-08-19 Thread Alex Vigdor (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745048#action_12745048
 ] 

Alex Vigdor commented on LUCENE-1824:
-

The failing test was due to an extra whitespace character at the beginning of 
the output, which I think is insignificant.

However, I appreciate that the whitespace approach will not work for CJK, so I 
have moved my modifications to a new WhitespaceFragmentBuilder class and 
associated test class.  The updated patch now contains just these two new 
classes and no modifications to other code.

I don't want to hold up the release of 2.9, but anyone attempting to use the 
SimpleFragmentsBuilder with latin languages, or others that use whitespace to 
delimit words, will be dismayed by the rampant truncation!

 FastVectorHighlighter truncates words at beginning and end of fragments
 ---

 Key: LUCENE-1824
 URL: https://issues.apache.org/jira/browse/LUCENE-1824
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
 Environment: any
Reporter: Alex Vigdor
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-1824.patch


 FastVectorHighlighter does not take word boundaries into consideration when 
 building fragments, so that in most cases the first and last word of a 
 fragment are truncated.  This makes the highlights less legible than they 
 should be.  I will attach a patch to BaseFragmentBuilder that resolves this 
 by expanding the start and end boundaries of the fragment to the first 
 whitespace character on either side of the fragment, or the beginning or end 
 of the source text, whichever comes first.  This significantly improves 
 legibility, at the cost of returning a slightly larger number of characters 
 than specified for the fragment size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1821) Weight.scorer() not passed doc offset for sub reader

2009-08-19 Thread Tim Smith (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Smith updated LUCENE-1821:
--

Description: 
Now that searching is done on a per segment basis, there is no way for a Scorer 
to know the actual doc id for the document's it matches (only the relative 
doc offset into the segment)

If using caches in your scorer that are based on the entire index (all 
segments), there is now no way to index into them properly from inside a Scorer 
because the scorer is not passed the needed offset to calculate the real docid

suggest having Weight.scorer() method also take a integer for the doc offset

Abstract Weight class should have a constructor that takes this offset as well 
as a method to get the offset
All Weights that have sub weights must pass this offset down to created sub 
weights


Details on workaround:
In order to work around this, you must do the following:
* Subclass IndexSearcher
* Add int getIndexReaderBase(IndexReader) method to your subclass
* during Weight creation, the Weight must hold onto a reference to the passed 
in Searcher (casted to your sub class)
* during Scorer creation, the Scorer must be passed the result of 
YourSearcher.getIndexReaderBase(reader)
* Scorer can now rebase any collected docids using this offset

Example implementation of getIndexReaderBase():
{code}
// NOTE: more efficient implementation can be done if you cache the result if 
gatherSubReaders in your constructor
public int getIndexReaderBase(IndexReader reader) {
  if (reader == getReader()) {
return 0;
  } else {
List readers = new ArrayList();
gatherSubReaders(readers);
Iterator iter = readers.iterator();
int maxDoc = 0;
while (iter.hasNext()) {
  IndexReader r = (IndexReader)iter.next();
  if (r == reader) {
return maxDoc;
  } 
  maxDoc += r.maxDoc();
} 
  }
  return -1; // reader not in searcher
}
{code}

Notes:
* This workaround makes it so you cannot serialize your custom Weight 
implementation


  was:
Now that searching is done on a per segment basis, there is no way for a Scorer 
to know the actual doc id for the document's it matches (only the relative 
doc offset into the segment)

If using caches in your scorer that are based on the entire index (all 
segments), there is now no way to index into them properly from inside a Scorer 
because the scorer is not passed the needed offset to calculate the real docid

suggest having Weight.scorer() method also take a integer for the doc offset

Abstract Weight class should have a constructor that takes this offset as well 
as a method to get the offset
All Weights that have sub weights must pass this offset down to created sub 
weights






 Weight.scorer() not passed doc offset for sub reader
 --

 Key: LUCENE-1821
 URL: https://issues.apache.org/jira/browse/LUCENE-1821
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Tim Smith
 Attachments: LUCENE-1821.patch


 Now that searching is done on a per segment basis, there is no way for a 
 Scorer to know the actual doc id for the document's it matches (only the 
 relative doc offset into the segment)
 If using caches in your scorer that are based on the entire index (all 
 segments), there is now no way to index into them properly from inside a 
 Scorer because the scorer is not passed the needed offset to calculate the 
 real docid
 suggest having Weight.scorer() method also take a integer for the doc offset
 Abstract Weight class should have a constructor that takes this offset as 
 well as a method to get the offset
 All Weights that have sub weights must pass this offset down to created 
 sub weights
 Details on workaround:
 In order to work around this, you must do the following:
 * Subclass IndexSearcher
 * Add int getIndexReaderBase(IndexReader) method to your subclass
 * during Weight creation, the Weight must hold onto a reference to the passed 
 in Searcher (casted to your sub class)
 * during Scorer creation, the Scorer must be passed the result of 
 YourSearcher.getIndexReaderBase(reader)
 * Scorer can now rebase any collected docids using this offset
 Example implementation of getIndexReaderBase():
 {code}
 // NOTE: more efficient implementation can be done if you cache the result if 
 gatherSubReaders in your constructor
 public int getIndexReaderBase(IndexReader reader) {
   if (reader == getReader()) {
 return 0;
   } else {
 List readers = new ArrayList();
 gatherSubReaders(readers);
 Iterator iter = readers.iterator();
 int maxDoc = 0;
 while (iter.hasNext()) {
   IndexReader r = (IndexReader)iter.next();
   if (r == reader) {
 return maxDoc;
   } 
   maxDoc += r.maxDoc();
 } 
   }
   return -1; // reader 

Re: Finishing Lucene 2.9

2009-08-19 Thread Mark Miller

0 issues! Congrats everyone. 2.9 was quite a beast.

So looks like we should get a few things in order.

1. Anyone dying to be release manager? I think I could do it, but I'm 
kind of pressed for time ...


2. Lets start crawling all over this release - bugs/javadoc/packaging etc.

3. In regards to that - I'd like to suggest that we don't do the release 
branch early for 2.9. I know we normally make the release
   branch so that further dev can continue on trunk. In this case I 
don't think that is wise. I propose that we lock down trunk for a   
while, to force people to concentrate on *this* release. Otherwise we 
divide our limited forces into two - those working on release, and those 
working on trunk and beyond. We can kind of enforce this by making the 
release branch last minute I think.


4. I suggest we offer an early release candidate type build (very soon) 
- nothing official, nothing signed - just something easier for our user 
community to test with if they are not very familiar with building a 
release off of trunk.


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Yonik Seeley
On Wed, Aug 19, 2009 at 10:49 AM, Mark Millermarkrmil...@gmail.com wrote:
 3. In regards to that - I'd like to suggest that we don't do the release
 branch early for 2.9. I know we normally make the release
   branch so that further dev can continue on trunk. In this case I don't
 think that is wise. I propose that we lock down trunk for a   while, to
 force people to concentrate on *this* release. Otherwise we divide our
 limited forces into two - those working on release, and those working on
 trunk and beyond. We can kind of enforce this by making the release branch
 last minute I think.

+1

I've experienced the extra pain of having to merge every change from
branch up until the release (esp when the CHANGES.txt is different and
patch fails) - there's really no point - checkins for the next release
can normally wait.

 4. I suggest we offer an early release candidate type build (very soon) -
 nothing official, nothing signed - just something easier for our user
 community to test with if they are not very familiar with building a release
 off of trunk.

+1

I've also observed people bringing up release nits only *after* an
official vote for a package has started - that messes up stuff like
trying to post-date in CHANGES.  Developers should do ant package
*now* and bring up issues and objections while it's easy to fix - get
everything possible out of the way before the official VOTE thread.

A final note - AFAIK, the ReleaseTodo
http://wiki.apache.org/jakarta-lucene/ReleaseTodo is for the purpose
of helping people do releases - it's not an official release process
where every step must be followed... these are only guidelines.
There's also no reason why the release manager needs to be the one
to do all the items like run RAT, etc.  That can be done by anyone
interested - including other contributors who do not yet have commit
privileges.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Yonik Seeley
On Wed, Aug 19, 2009 at 1:52 PM, Grant Ingersollgsing...@apache.org wrote:
 the RM should follow the release procedure as specified.

Wiki documents are normally not official - anyone can modify them, and
people have been with little/no discussion.  I'll admit that I can't
always follow java-dev, so I may have missed a vote to codify/upgrade
this release guideline as an official process that must be followed.

At least I know that's not the case in Solr-land though, and I've
updated the wiki to reflect that.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Grant Ingersoll


On Aug 19, 2009, at 2:13 PM, Yonik Seeley wrote:

On Wed, Aug 19, 2009 at 1:52 PM, Grant  
Ingersollgsing...@apache.org wrote:

the RM should follow the release procedure as specified.


Wiki documents are normally not official - anyone can modify them, and
people have been with little/no discussion.  I'll admit that I can't
always follow java-dev, so I may have missed a vote to codify/upgrade
this release guideline as an official process that must be followed.

At least I know that's not the case in Solr-land though, and I've
updated the wiki to reflect that.


I find it scary to think that one release might contain Maven  
artifacts, for instance, while another, done by a different person,  
might not, simply b/c the RM doesn't feel like it.  I don't agree  
here, and I don't agree for Solr.  Stable RM is as important as  
backward compatibility, if not more so.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Mark Miller

Okay, I can do the test/beta release dist and host on people.apache.org.

Anyone have any pref on what we call this? Its not really a release 
candidate per say, though I have no

problem calling it that. We can go from rc1 to rc20 for all it matters.

--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Grant Ingersoll

So, are we under a code freeze now?  And only doing doc/breakers?

-Grant

On Aug 19, 2009, at 3:08 PM, Mark Miller wrote:

Okay, I can do the test/beta release dist and host on  
people.apache.org.


Anyone have any pref on what we call this? Its not really a release  
candidate per say, though I have no
problem calling it that. We can go from rc1 to rc20 for all it  
matters.


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Mark Miller
Not sure - though if not now, than extremely imminently.  I have no 
problem giving a bit of time for people to weigh in on that.


I'm trying to get a feel for what the community wants to do before 
actually putting
anything up or sending anything out to java-user. I'm prepped to go when 
it makes sense.


- Mark

Grant Ingersoll wrote:

So, are we under a code freeze now?  And only doing doc/breakers?

-Grant

On Aug 19, 2009, at 3:08 PM, Mark Miller wrote:


Okay, I can do the test/beta release dist and host on people.apache.org.

Anyone have any pref on what we call this? Its not really a release 
candidate per say, though I have no

problem calling it that. We can go from rc1 to rc20 for all it matters.

--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Michael Busch

On 8/19/09 11:43 AM, Grant Ingersoll wrote:


On Aug 19, 2009, at 2:13 PM, Yonik Seeley wrote:

On Wed, Aug 19, 2009 at 1:52 PM, Grant Ingersollgsing...@apache.org 
wrote:

the RM should follow the release procedure as specified.


Wiki documents are normally not official - anyone can modify them, and
people have been with little/no discussion.  I'll admit that I can't
always follow java-dev, so I may have missed a vote to codify/upgrade
this release guideline as an official process that must be followed.

At least I know that's not the case in Solr-land though, and I've
updated the wiki to reflect that.


I find it scary to think that one release might contain Maven 
artifacts, for instance, while another, done by a different person, 
might not, simply b/c the RM doesn't feel like it.  I don't agree 
here, and I don't agree for Solr.  Stable RM is as important as 
backward compatibility, if not more so.


+1. I too think that the RM should follow the guidelines.

 Michael

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Michael Busch
When I was the RM I usually sent out a note in advance with a tentative 
schedule, i.e. code freeze date, length of code freeze period, release 
date (again, all tentative of course). Then the community could give 
feedback on that proposed schedule and could plan accordingly.


 Michael

On 8/19/09 1:19 PM, Mark Miller wrote:
Not sure - though if not now, than extremely imminently.  I have no 
problem giving a bit of time for people to weigh in on that.


I'm trying to get a feel for what the community wants to do before 
actually putting
anything up or sending anything out to java-user. I'm prepped to go 
when it makes sense.


- Mark

Grant Ingersoll wrote:

So, are we under a code freeze now?  And only doing doc/breakers?

-Grant

On Aug 19, 2009, at 3:08 PM, Mark Miller wrote:

Okay, I can do the test/beta release dist and host on 
people.apache.org.


Anyone have any pref on what we call this? Its not really a release 
candidate per say, though I have no

problem calling it that. We can go from rc1 to rc20 for all it matters.

--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org







-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Mark Miller
I hadn't settled on me being the RM yet ;) Though if no one else steps 
up, I will be.


I was suggesting a kind of earlier, looser test jar than what we have 
previously done as an RC (essentially a nightly (which are hard to find 
lately IME - last one I got I had to dig through Hudson) of trunk) - 
just for users that havn't built from svn, and wouldn't normally go 
through the hassle. The more users testing the earlier, the better. And 
that is what I was volunteering to do. However, looking at the Release 
TODO's, this still really fits the mold anyway. No need to do anything 
special I guess - just get to the RC step quickly I suppose - and 
knowing that other rcs are likely to follow.



- Mark

Michael Busch wrote:
When I was the RM I usually sent out a note in advance with a 
tentative schedule, i.e. code freeze date, length of code freeze 
period, release date (again, all tentative of course). Then the 
community could give feedback on that proposed schedule and could plan 
accordingly.


 Michael

On 8/19/09 1:19 PM, Mark Miller wrote:
Not sure - though if not now, than extremely imminently.  I have no 
problem giving a bit of time for people to weigh in on that.


I'm trying to get a feel for what the community wants to do before 
actually putting
anything up or sending anything out to java-user. I'm prepped to go 
when it makes sense.


- Mark

Grant Ingersoll wrote:

So, are we under a code freeze now?  And only doing doc/breakers?

-Grant

On Aug 19, 2009, at 3:08 PM, Mark Miller wrote:

Okay, I can do the test/beta release dist and host on 
people.apache.org.


Anyone have any pref on what we call this? Its not really a release 
candidate per say, though I have no
problem calling it that. We can go from rc1 to rc20 for all it 
matters.


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org







-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: Finishing Lucene 2.9

2009-08-19 Thread Uwe Schindler
 0 issues! Congrats everyone. 2.9 was quite a beast.
 
 So looks like we should get a few things in order.
 
 1. Anyone dying to be release manager? I think I could do it, but I'm
 kind of pressed for time ...
 
 2. Lets start crawling all over this release - bugs/javadoc/packaging etc.
 
 3. In regards to that - I'd like to suggest that we don't do the release
 branch early for 2.9. I know we normally make the release
 branch so that further dev can continue on trunk. In this case I
 don't think that is wise. I propose that we lock down trunk for a
 while, to force people to concentrate on *this* release. Otherwise we
 divide our limited forces into two - those working on release, and those
 working on trunk and beyond. We can kind of enforce this by making the
 release branch last minute I think.

I think 3.0 is a little bit special: We move to Java 1.5, so in my opinion,
we should not only remove deprecations, but also add Generics and remove
StringBuffer and so on. I have some patches for that available, e.g. the
casting currently needed for the Attributes API can be more elegantly solved
by using generics (something like T addAttribute(ClassT extends
Attribute)). If we do not add generics to the public API in 3.0, we have
to wait one major release longer to add them.

To get the 3.0 release shortly after 2.9, we should branch now, that the
generics commits could be done early. I would also help to do this (at least
for the parts I was working on the last time).

 4. I suggest we offer an early release candidate type build (very soon)
 - nothing official, nothing signed - just something easier for our user
 community to test with if they are not very familiar with building a
 release off of trunk.

+1 Start the release process!

Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Mark Miller
Uwe Schindler wrote:
 0 issues! Congrats everyone. 2.9 was quite a beast.

 So looks like we should get a few things in order.

 1. Anyone dying to be release manager? I think I could do it, but I'm
 kind of pressed for time ...

 2. Lets start crawling all over this release - bugs/javadoc/packaging etc.

 3. In regards to that - I'd like to suggest that we don't do the release
 branch early for 2.9. I know we normally make the release
 branch so that further dev can continue on trunk. In this case I
 don't think that is wise. I propose that we lock down trunk for a
 while, to force people to concentrate on *this* release. Otherwise we
 divide our limited forces into two - those working on release, and those
 working on trunk and beyond. We can kind of enforce this by making the
 release branch last minute I think.
 

 I think 3.0 is a little bit special: We move to Java 1.5, so in my opinion,
 we should not only remove deprecations, but also add Generics and remove
 StringBuffer and so on. I have some patches for that available, e.g. the
 casting currently needed for the Attributes API can be more elegantly solved
 by using generics (something like T addAttribute(ClassT extends
 Attribute)). If we do not add generics to the public API in 3.0, we have
 to wait one major release longer to add them.

 To get the 3.0 release shortly after 2.9, we should branch now, that the
 generics commits could be done early. I would also help to do this (at least
 for the parts I was working on the last time).

   
I forgot about this oddity. Its so weird. Its like we are doing two
releases on top of each other - it just seems confusing.

Apache Lucene announces 2.9 - a lot of hard work and sweat - move to it

and five minutes later

Apache Lucene announces 3.0 - very little work, but different and
improved (generified anyway). No new features in 3.0. Hold the applause.
Now move to it.

I vote to make this more sane :)

-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1768) NumericRange support for new query parser

2009-08-19 Thread Adriano Crestani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745236#action_12745236
 ] 

Adriano Crestani commented on LUCENE-1768:
--

{quote}
we should rename RangeQueryNode to TermRangeQueryNode (to match lucene name)

I would not do this. RangeQueryNode is in the syntax tree and the syntax of 
numeric and term ranges is equal, so the query parser cannot know what type of 
query it is. When this issue is fixed 3.1, this node will use the configuration 
of data types for field names (date, numeric, term) to create the correct range 
query.
{quote}

I think it's ok to rename, as far as I know, the standard.parser.SyntaxParser 
generates ParametricRangeQueryNode from a range query, which has 2 
ParametricQueryNode as child. So, the range processor, will need to convert the 
2 ParametricQueryNode to the respective type, based on the user config: 
TermRangeQueryNode (renamed from RangeQueryNode) or NumericRangeQueryNode.

 NumericRange support for new query parser
 -

 Key: LUCENE-1768
 URL: https://issues.apache.org/jira/browse/LUCENE-1768
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


 It would be good to specify some type of schema for the query parser in 
 future, to automatically create NumericRangeQuery for different numeric 
 types? It would then be possible to index a numeric value 
 (double,float,long,int) using NumericField and then the query parser knows, 
 which type of field this is and so it correctly creates a NumericRangeQuery 
 for strings like [1.567..*] or (1.787..19.5].
 There is currently no way to extract if a field is numeric from the index, so 
 the user will have to configure the FieldConfig objects in the ConfigHandler. 
 But if this is done, it will not be that difficult to implement the rest.
 The only difference between the current handling of RangeQuery is then the 
 instantiation of the correct Query type and conversion of the entered numeric 
 values (simple Number.valueOf(...) cast of the user entered numbers). 
 Evenerything else is identical, NumericRangeQuery also supports the MTQ 
 rewrite modes (as it is a MTQ).
 Another thing is a change in Date semantics. There are some strange flags in 
 the current parser that tells it how to handle dates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-19 Thread Michael Busch

On 8/19/09 3:16 PM, Uwe Schindler wrote:

0 issues! Congrats everyone. 2.9 was quite a beast.

So looks like we should get a few things in order.

1. Anyone dying to be release manager? I think I could do it, but I'm
kind of pressed for time ...

2. Lets start crawling all over this release - bugs/javadoc/packaging etc.

3. In regards to that - I'd like to suggest that we don't do the release
branch early for 2.9. I know we normally make the release
 branch so that further dev can continue on trunk. In this case I
don't think that is wise. I propose that we lock down trunk for a
while, to force people to concentrate on *this* release. Otherwise we
divide our limited forces into two - those working on release, and those
working on trunk and beyond. We can kind of enforce this by making the
release branch last minute I think.
 

I think 3.0 is a little bit special: We move to Java 1.5, so in my opinion,
we should not only remove deprecations, but also add Generics and remove
StringBuffer and so on. I have some patches for that available, e.g. the
casting currently needed for the Attributes API can be more elegantly solved
by using generics (something like T addAttribute(ClassT extends
Attribute)). If we do not add generics to the public API in 3.0, we have
to wait one major release longer to add them.

   


Yes, I added that already in the very first AttributeSource patch - it's 
currently commented out
at the bottom of the class I think. Probably a bit out of date. I 
definitely want to do that to improve
readability of the attributes, it's much nicer with generics. That's how 
I started coding it and why
I started liking the syntax, before I needed to make it a bit ugly for 
JDK 1.4.


 Michael

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1823) QueryParser with new features for Lucene 3

2009-08-19 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745251#action_12745251
 ] 

Michael Busch commented on LUCENE-1823:
---

I think Solr has a feature similar to what I called 'Opaque terms: Nested 
Queries.

 QueryParser with new features for Lucene 3
 --

 Key: LUCENE-1823
 URL: https://issues.apache.org/jira/browse/LUCENE-1823
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 3.1


 I'd like to have a new QueryParser implementation in Lucene 3.1, ideally 
 based on the new QP framework in contrib. It should share as much code as 
 possible with the current StandardQueryParser implementation for easy 
 maintainability.
 Wish list (feel free to extend):
 1. *Operator precedence*: Support operator precedence for boolean operators
 2. *Opaque terms*: Ability to plugin an external parser for certain syntax 
 extensions, e.g. XML query terms
 3. *Improved RangeQuery syntax*: Use more intuitive =, =, = instead of [] 
 and {}
 4. *Support for trierange queries*: See LUCENE-1768
 5. *Complex phrases*: See LUCENE-1486
 6. *ANY operator*: E.g. (a b c d) ANY 3 should match if 3 of the 4 terms 
 occur in the same document
 7. *New syntax for Span queries*: I think the surround parser supports this?
 8. *Escaped wildcards*: See LUCENE-588

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org