date:20091218

Thread starvation problems in some tests


 Key: LUCENE-2170
 URL: https://issues.apache.org/jira/browse/LUCENE-2170
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1


In some of the tests, a time limit is set and the tests have a "while (inTime)" 
loop. If creation of thread under heavy load is too slow, the tasks are not 
done. Most tests are only useful, if the task is at least done once (most would 
even fail).

This thread changes the loops to be do...while, so the task is run at least one 
time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2170) Thread starvation problems in some tests


[ 
https://issues.apache.org/jira/browse/LUCENE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792402#action_12792402
 ] 

Uwe Schindler commented on LUCENE-2170:
---

This patch also fixes incorrect multi-threaded use of a boolean variable. I 
made it volatile.

> Thread starvation problems in some tests
> 
>
> Key: LUCENE-2170
> URL: https://issues.apache.org/jira/browse/LUCENE-2170
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.1
>
> Attachments: LUCENE-2170.patch
>
>
> In some of the tests, a time limit is set and the tests have a "while 
> (inTime)" loop. If creation of thread under heavy load is too slow, the tasks 
> are not done. Most tests are only useful, if the task is at least done once 
> (most would even fail).
> This thread changes the loops to be do...while, so the task is run at least 
> one time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2170) Thread starvation problems in some tests


 [ 
https://issues.apache.org/jira/browse/LUCENE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2170:
--

Attachment: LUCENE-2170.patch

Patch that fixes this issue. I will port to backwards, too.

> Thread starvation problems in some tests
> 
>
> Key: LUCENE-2170
> URL: https://issues.apache.org/jira/browse/LUCENE-2170
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.1
>
> Attachments: LUCENE-2170.patch
>
>
> In some of the tests, a time limit is set and the tests have a "while 
> (inTime)" loop. If creation of thread under heavy load is too slow, the tasks 
> are not done. Most tests are only useful, if the task is at least done once 
> (most would even fail).
> This thread changes the loops to be do...while, so the task is run at least 
> one time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Build failed in Hudson: Lucene-trunk #1034

2009-12-18 Thread Uwe Schindler

I opened https://issues.apache.org/jira/browse/LUCENE-2170 to fix this
timing issue.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Friday, December 18, 2009 8:29 AM
> To: java-dev@lucene.apache.org
> Subject: RE: Build failed in Hudson: Lucene-trunk #1034
> 
> When looking around, TestStressIndexing has the same problem (only the
> time
> is there 1.0 sec). Attached is a patch also for that. Also this test does
> not use a volatile variable for the error boolean. Attached is a patch.
> 
> There are also other threads that may fail to do anything when the system
> is
> under heavy load. Search for "currentTimeMillis" in tests :(
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
> > -Original Message-
> > From: Uwe Schindler [mailto:u...@thetaphi.de]
> > Sent: Friday, December 18, 2009 8:16 AM
> > To: java-dev@lucene.apache.org
> > Subject: RE: Build failed in Hudson: Lucene-trunk #1034
> >
> > Here the patch, Mike, does this look ok?
> >
> > When committed we can start a new Hudson run, if you like :-)
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> > > -Original Message-
> > > From: Uwe Schindler [mailto:u...@thetaphi.de]
> > > Sent: Friday, December 18, 2009 7:56 AM
> > > To: java-dev@lucene.apache.org
> > > Subject: RE: Build failed in Hudson: Lucene-trunk #1034
> > >
> > > > A new strange test failure:
> > > > [junit] Testcase:
> > > > testDuringAddDelete(org.apache.lucene.index.TestIndexWriterReader):
> > > > FAILED
> > > > [junit] null
> > > > [junit] junit.framework.AssertionFailedError: null
> > > > [junit] at
> > > >
> > >
> >
> org.apache.lucene.index.TestIndexWriterReader.testDuringAddDelete(TestInde
> > > > xW
> > > > riterReader.java:835)
> > > > [junit] at
> > > >
> org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:208)
> > > > [junit]
> > > > [junit]
> > > > [junit] Test org.apache.lucene.index.TestIndexWriterReader
> FAILED
> > > >
> > > >
> > > > (occurred in test-tag!)
> > > >
> > > > It's again a multi-threaded test, after joining all threads, the sum
> > of
> > > > hits
> > > > is not >0, so added documents were not seen.
> > >
> > > The problem is, the same like with the Benchmark test. The test spawns
> a
> > > number of threads. All these threads run half a second and add
> Documents
> > > during that time. The main thread does reopen and searches, and after
> > 0.5
> > > secs it joins all threads. It seems that because of clover analysis
> and
> > > the
> > > very slow machine (load!) on Hudson, all threads exited after starting
> > > because creating the thread exceeded the limit on
> getCurrentTimeMillies.
> > >
> > > Two possibilities: Raise the wait time or better:
> > > Replace by a do-while loop in the thread so each thread at least adds
> a
> > > document one time.
> > > In main thread also wait 0.5 secs, but after joining all threads again
> > do
> > > the search and add to sum.
> > >
> > > Must be done in trunk and tag.
> > >
> > > Uwe
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2035) TokenSources.getTokenStream() does not assign positionIncrement

2009-12-18 Thread Christopher Morris (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792410#action_12792410
 ] 

Christopher Morris commented on LUCENE-2035:


Cheers Mark,

The custom collector was probably because I was learning the new API at the 
time.

The only changes I've made since the patch I submitted were to initialise the 
ArrayList with tpv.getTerms().length because that represents the minimum size 
that the list will grow to, and to replace the List and Iterator fields with an 
array (derived from the list) and an integer pointer. Both of which are 
probably unnecessary.

The tests could be improved - the first case could be fixed in it's present 
form by using the Analyzer to generate the phrase query. If the stemmed word 
was the middle word of the phrase then that fix wouldn't work.

> TokenSources.getTokenStream() does not assign positionIncrement
> ---
>
> Key: LUCENE-2035
> URL: https://issues.apache.org/jira/browse/LUCENE-2035
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/highlighter
>Affects Versions: 2.4, 2.4.1, 2.9
>Reporter: Christopher Morris
>Assignee: Mark Miller
> Fix For: 3.1
>
> Attachments: LUCENE-2035.patch, LUCENE-2035.patch, LUCENE-2305.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> TokenSources.StoredTokenStream does not assign positionIncrement information. 
> This means that all tokens in the stream are considered adjacent. This has 
> implications for the phrase highlighting in QueryScorer when using 
> non-contiguous tokens.
> For example:
> Consider  a token stream that creates tokens for both the stemmed and 
> unstemmed version of each word - the fox (jump|jumped)
> When retrieved from the index using TokenSources.getTokenStream(tpv,false), 
> the token stream will be - the fox jump jumped
> Now try a search and highlight for the phrase query "fox jumped". The search 
> will correctly find the document; the highlighter will fail to highlight the 
> phrase because it thinks that there is an additional word between "fox" and 
> "jumped". If we use the original (from the analyzer) token stream then the 
> highlighter works.
> Also, consider the converse - the fox did not jump
> "not" is a stop word and there is an option to increment the position to 
> account for stop words - (the,0) (fox,1) (did,2) (jump,4)
> When retrieved from the index using TokenSources.getTokenStream(tpv,false), 
> the token stream will be - (the,0) (fox,1) (did,2) (jump,3).
> So the phrase query "did jump" will cause the "did" and "jump" terms in the 
> text "did not jump" to be highlighted. If we use the original (from the 
> analyzer) token stream then the highlighter works correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2170) Thread starvation problems in some tests


[ 
https://issues.apache.org/jira/browse/LUCENE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792416#action_12792416
 ] 

Michael McCandless commented on LUCENE-2170:


Ugh!

Looks good Uwe.  Thanks!

> Thread starvation problems in some tests
> 
>
> Key: LUCENE-2170
> URL: https://issues.apache.org/jira/browse/LUCENE-2170
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.1
>
> Attachments: LUCENE-2170.patch
>
>
> In some of the tests, a time limit is set and the tests have a "while 
> (inTime)" loop. If creation of thread under heavy load is too slow, the tasks 
> are not done. Most tests are only useful, if the task is at least done once 
> (most would even fail).
> This thread changes the loops to be do...while, so the task is run at least 
> one time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Atlassian Clover Site License for Apache

2009-12-18 Thread Michael McCandless

Thanks Atlassian!

The new clover report (after ugprading to 2.6 for our nighly build) is
fabulous -- eg from yesterday's Lucene trunk build:

  http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1033/clover-report

I think we should call wider attention to this, so other Apache
committers & contributors are aware -- does anyone know the best place
(wiki, mailing list) in Apache to publicize this?

Mike

On Fri, Dec 18, 2009 at 1:33 AM, Nicholas Muldoon
 wrote:
> Hi,
>
> Atlassian are excited to be presenting Apache with a site license for Clover
> 2.6.
>
> This Clover license can be used for any code that is under an org.apache
> package. Further, this license can be used by any developer on their machine
> in conjunction with our Eclipse or IntelliJ plugins for development on an
> org.apache project. Please use the following license:
>
> AAABJw0ODAoPeNpdkMtqwzAQRff6CkHXCX60xTUIWmwXAokd6qSrQpkqk0REloUku8nf16/QkO0M5
> 9y58/BuBE2RUz+inhcHXhw+0aTc0MDzXkiKlhuhnagVS2TdovmKaaFR0bJuDEeSGIR+m4JD1iMzP
> 5j5EUlq5YC7HCpkK3FCuuIc1E6itYQPonneVD9oiv3WorFs5l+ZbAVCsqqDqoF5BQ38iPPaHEjWg
> myGQLYHafHq6jDRInOm+R9JWf/iTp8O2uBenNGyzjAfZUQKjsriZxfdywLShSqHChTH7KyFuUyNf
> G9qNGmXI7i5aBzKFess/y6L7UeSkcIcQAk73vc2BpVZzvzw+TEKQzKxi5QtF+nd8Ca0UVJUwuGOr
> LsfH8Hi/Xf/AKr+lFAwLAIUbZJXXec7rOMv34bm2Jt2aFDczSICFCfzK7TpFe/6TeF2k7ChCciTN
> jKmX02eq
>
> Thank You. Have a brilliant Christmas, and all the best for the new year.
>
> Regards,
>
> Nicholas Muldoon
>
>
>
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

RE: Atlassian Clover Site License for Apache

2009-12-18 Thread Uwe Schindler

As a first step we should upload to the private committers area where the
other clover licenses are. But also make public, that it's now open for all,
also non-committers.

Do we have to contact infrastructure or whoever first?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Friday, December 18, 2009 11:09 AM
> To: Nicholas Muldoon
> Cc: java-dev@lucene.apache.org; Nick Pellow; j...@biccard.com
> Subject: Re: Atlassian Clover Site License for Apache
> 
> Thanks Atlassian!
> 
> The new clover report (after ugprading to 2.6 for our nighly build) is
> fabulous -- eg from yesterday's Lucene trunk build:
> 
>   http://hudson.zones.apache.org/hudson/job/Lucene-trunk/1033/clover-
> report
> 
> I think we should call wider attention to this, so other Apache
> committers & contributors are aware -- does anyone know the best place
> (wiki, mailing list) in Apache to publicize this?
> 
> Mike
> 
> On Fri, Dec 18, 2009 at 1:33 AM, Nicholas Muldoon
>  wrote:
> > Hi,
> >
> > Atlassian are excited to be presenting Apache with a site license for
> Clover
> > 2.6.
> >
> > This Clover license can be used for any code that is under an org.apache
> > package. Further, this license can be used by any developer on their
> machine
> > in conjunction with our Eclipse or IntelliJ plugins for development on
> an
> > org.apache project. Please use the following license:
> >
> >
> AAABJw0ODAoPeNpdkMtqwzAQRff6CkHXCX60xTUIWmwXAokd6qSrQpkqk0REloUku8nf16/QkO
> 0M5
> >
> 9y58/BuBE2RUz+inhcHXhw+0aTc0MDzXkiKlhuhnagVS2TdovmKaaFR0bJuDEeSGIR+m4JD1iM
> zP
> >
> 5j5EUlq5YC7HCpkK3FCuuIc1E6itYQPonneVD9oiv3WorFs5l+ZbAVCsqqDqoF5BQ38iPPaHEj
> Wg
> >
> myGQLYHafHq6jDRInOm+R9JWf/iTp8O2uBenNGyzjAfZUQKjsriZxfdywLShSqHChTH7KyFuUy
> Nf
> >
> G9qNGmXI7i5aBzKFess/y6L7UeSkcIcQAk73vc2BpVZzvzw+TEKQzKxi5QtF+nd8Ca0UVJUwuG
> Or
> >
> LsfH8Hi/Xf/AKr+lFAwLAIUbZJXXec7rOMv34bm2Jt2aFDczSICFCfzK7TpFe/6TeF2k7ChCci
> TN
> > jKmX02eq
> >
> > Thank You. Have a brilliant Christmas, and all the best for the new
> year.
> >
> > Regards,
> >
> > Nicholas Muldoon
> >
> >
> >
> >
> >
> 
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2170) Thread starvation problems in some tests


 [ 
https://issues.apache.org/jira/browse/LUCENE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2170:
--

Attachment: LUCENE-2170-tag.patch

Patch for bw branch.

I will commit soon.

> Thread starvation problems in some tests
> 
>
> Key: LUCENE-2170
> URL: https://issues.apache.org/jira/browse/LUCENE-2170
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.1
>
> Attachments: LUCENE-2170-tag.patch, LUCENE-2170.patch
>
>
> In some of the tests, a time limit is set and the tests have a "while 
> (inTime)" loop. If creation of thread under heavy load is too slow, the tasks 
> are not done. Most tests are only useful, if the task is at least done once 
> (most would even fail).
> This thread changes the loops to be do...while, so the task is run at least 
> one time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2170) Thread starvation problems in some tests


[ 
https://issues.apache.org/jira/browse/LUCENE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792422#action_12792422
 ] 

Uwe Schindler commented on LUCENE-2170:
---

If we want to enable builds of 3.0 or 2.9 branch on hudson, we must backport 
this.

> Thread starvation problems in some tests
> 
>
> Key: LUCENE-2170
> URL: https://issues.apache.org/jira/browse/LUCENE-2170
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.1
>
> Attachments: LUCENE-2170-tag.patch, LUCENE-2170.patch
>
>
> In some of the tests, a time limit is set and the tests have a "while 
> (inTime)" loop. If creation of thread under heavy load is too slow, the tasks 
> are not done. Most tests are only useful, if the task is at least done once 
> (most would even fail).
> This thread changes the loops to be do...while, so the task is run at least 
> one time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2170) Thread starvation problems in some tests


 [ 
https://issues.apache.org/jira/browse/LUCENE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved LUCENE-2170.
---

Resolution: Fixed

Committed revision: 892216

> Thread starvation problems in some tests
> 
>
> Key: LUCENE-2170
> URL: https://issues.apache.org/jira/browse/LUCENE-2170
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.1
>
> Attachments: LUCENE-2170-tag.patch, LUCENE-2170-tag.patch, 
> LUCENE-2170.patch, LUCENE-2170.patch
>
>
> In some of the tests, a time limit is set and the tests have a "while 
> (inTime)" loop. If creation of thread under heavy load is too slow, the tasks 
> are not done. Most tests are only useful, if the task is at least done once 
> (most would even fail).
> This thread changes the loops to be do...while, so the task is run at least 
> one time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2170) Thread starvation problems in some tests


 [ 
https://issues.apache.org/jira/browse/LUCENE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2170:
--

Attachment: LUCENE-2170-tag.patch
LUCENE-2170.patch

small update in patches (while loop).

> Thread starvation problems in some tests
> 
>
> Key: LUCENE-2170
> URL: https://issues.apache.org/jira/browse/LUCENE-2170
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 3.1
>
> Attachments: LUCENE-2170-tag.patch, LUCENE-2170-tag.patch, 
> LUCENE-2170.patch, LUCENE-2170.patch
>
>
> In some of the tests, a time limit is set and the tests have a "while 
> (inTime)" loop. If creation of thread under heavy load is too slow, the tasks 
> are not done. Most tests are only useful, if the task is at least done once 
> (most would even fail).
> This thread changes the loops to be do...while, so the task is run at least 
> one time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2147) Improve Spatial Utility like classes


 [ 
https://issues.apache.org/jira/browse/LUCENE-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-2147:
---

Attachment: LUCENE-2147.patch

- Fixed bug in the patch which assigned earthCircumference to earthRadius in 
the DistanceUnits constructor.
- Added javadoc to DistanceUnits constructor for new parameters
- Tidied some of the other javadoc

> Improve Spatial Utility like classes
> 
>
> Key: LUCENE-2147
> URL: https://issues.apache.org/jira/browse/LUCENE-2147
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/spatial
>Affects Versions: 3.1
>Reporter: Chris Male
>Assignee: Simon Willnauer
> Attachments: LUCENE-2147.patch, LUCENE-2147.patch, LUCENE-2147.patch
>
>
> - DistanceUnits can be improved by giving functionality to the enum, such as 
> being able to convert between different units, and adding tests.  
> - GeoHashUtils can be improved through some code tidying, documentation, and 
> tests.
> - SpatialConstants allows us to move all constants, such as the radii and 
> circumferences of Earth, to a single consistent location that we can then use 
> throughout the contrib.  This also allows us to improve the transparency of 
> calculations done in the contrib, as users of the contrib can easily see the 
> values being used.  Currently this issues does not migrate classes to use 
> these constants, that will happen in issues related to the appropriate 
> classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1769) Fix wrong clover analysis because of backwards-tests, upgrade clover to 2.6.3 or better


 [ 
https://issues.apache.org/jira/browse/LUCENE-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1769:
--

Attachment: (was: clover.license)

> Fix wrong clover analysis because of backwards-tests, upgrade clover to 2.6.3 
> or better
> ---
>
> Key: LUCENE-1769
> URL: https://issues.apache.org/jira/browse/LUCENE-1769
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1
>Reporter: Uwe Schindler
> Attachments: clover.license, LUCENE-1769-2.patch, LUCENE-1769.patch, 
> LUCENE-1769.patch, LUCENE-1769.patch, LUCENE-1769.patch, 
> nicks-LUCENE-1769.patch
>
>
> This is a followup for 
> [http://www.lucidimagination.com/search/document/6248d6eafbe10ef4/build_failed_in_hudson_lucene_trunk_902]
> The problem with clover running on hudson is, that it does not instrument all 
> tests ran. The autodetection of clover 1.x is not able to find out which 
> files are the correct tests and only instruments the backwards test. Because 
> of this, the current coverage report is only from the backwards tests running 
> against the current Lucene JAR.
> You can see this, if you install clover and start the tests. During test-core 
> no clover data is added to the db, only when backwards-tests begin, new files 
> are created in the clover db folder.
> Clover 2.x supports a new ant task,  that can be used to specify 
> the files, that are the tests. It works here locally with clover 2.4.3 and 
> produces a really nice coverage report, also linking with test files work, it 
> tells which tests failed and so on.
> I will attach a patch, that changes common-build.xml to the new clover 
> version (other initialization resource) and tells clover where to find the 
> tests (using the test folder include/exclude properties).
> One problem with the current patch: It does *not* instrument the backwards 
> branch, so you see only coverage of the core/contrib tests. Getting the 
> coverage also from the backwards tests is not easy possible because of two 
> things:
> - the tag test dir is not easy to find out and add to  element 
> (there may be only one of them)
> - the test names in BW branch are identical to the trunk tests. This 
> completely corrupts the linkage between tests and code in the coverage report.
> In principle the best would be to generate a second coverage report for the 
> backwards branch with a separate clover DB. The attached patch does not 
> instrument the bw branch, it only does trunk tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1769) Fix wrong clover analysis because of backwards-tests, upgrade clover to 2.6.3 or better


 [ 
https://issues.apache.org/jira/browse/LUCENE-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1769:
--

Attachment: clover.license

Updated license file from Atlassian. Thanks Nicholas Muldoon!

> Fix wrong clover analysis because of backwards-tests, upgrade clover to 2.6.3 
> or better
> ---
>
> Key: LUCENE-1769
> URL: https://issues.apache.org/jira/browse/LUCENE-1769
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1
>Reporter: Uwe Schindler
> Attachments: clover.license, LUCENE-1769-2.patch, LUCENE-1769.patch, 
> LUCENE-1769.patch, LUCENE-1769.patch, LUCENE-1769.patch, 
> nicks-LUCENE-1769.patch
>
>
> This is a followup for 
> [http://www.lucidimagination.com/search/document/6248d6eafbe10ef4/build_failed_in_hudson_lucene_trunk_902]
> The problem with clover running on hudson is, that it does not instrument all 
> tests ran. The autodetection of clover 1.x is not able to find out which 
> files are the correct tests and only instruments the backwards test. Because 
> of this, the current coverage report is only from the backwards tests running 
> against the current Lucene JAR.
> You can see this, if you install clover and start the tests. During test-core 
> no clover data is added to the db, only when backwards-tests begin, new files 
> are created in the clover db folder.
> Clover 2.x supports a new ant task,  that can be used to specify 
> the files, that are the tests. It works here locally with clover 2.4.3 and 
> produces a really nice coverage report, also linking with test files work, it 
> tells which tests failed and so on.
> I will attach a patch, that changes common-build.xml to the new clover 
> version (other initialization resource) and tells clover where to find the 
> tests (using the test folder include/exclude properties).
> One problem with the current patch: It does *not* instrument the backwards 
> branch, so you see only coverage of the core/contrib tests. Getting the 
> coverage also from the backwards tests is not easy possible because of two 
> things:
> - the tag test dir is not easy to find out and add to  element 
> (there may be only one of them)
> - the test names in BW branch are identical to the trunk tests. This 
> completely corrupts the linkage between tests and code in the coverage report.
> In principle the best would be to generate a second coverage report for the 
> backwards branch with a separate clover DB. The attached patch does not 
> instrument the bw branch, it only does trunk tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1769) Fix wrong clover analysis because of backwards-tests, upgrade clover to 2.6.3 or better


 [ 
https://issues.apache.org/jira/browse/LUCENE-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1769:
--

Attachment: clover.license

Updated license file from Atlassian. Thanks Nicholas Muldoon! (without ASF 
grant attached)

> Fix wrong clover analysis because of backwards-tests, upgrade clover to 2.6.3 
> or better
> ---
>
> Key: LUCENE-1769
> URL: https://issues.apache.org/jira/browse/LUCENE-1769
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.1
>Reporter: Uwe Schindler
> Attachments: clover.license, clover.license, LUCENE-1769-2.patch, 
> LUCENE-1769.patch, LUCENE-1769.patch, LUCENE-1769.patch, LUCENE-1769.patch, 
> nicks-LUCENE-1769.patch
>
>
> This is a followup for 
> [http://www.lucidimagination.com/search/document/6248d6eafbe10ef4/build_failed_in_hudson_lucene_trunk_902]
> The problem with clover running on hudson is, that it does not instrument all 
> tests ran. The autodetection of clover 1.x is not able to find out which 
> files are the correct tests and only instruments the backwards test. Because 
> of this, the current coverage report is only from the backwards tests running 
> against the current Lucene JAR.
> You can see this, if you install clover and start the tests. During test-core 
> no clover data is added to the db, only when backwards-tests begin, new files 
> are created in the clover db folder.
> Clover 2.x supports a new ant task,  that can be used to specify 
> the files, that are the tests. It works here locally with clover 2.4.3 and 
> produces a really nice coverage report, also linking with test files work, it 
> tells which tests failed and so on.
> I will attach a patch, that changes common-build.xml to the new clover 
> version (other initialization resource) and tells clover where to find the 
> tests (using the test folder include/exclude properties).
> One problem with the current patch: It does *not* instrument the backwards 
> branch, so you see only coverage of the core/contrib tests. Getting the 
> coverage also from the backwards tests is not easy possible because of two 
> things:
> - the tag test dir is not easy to find out and add to  element 
> (there may be only one of them)
> - the test names in BW branch are identical to the trunk tests. This 
> completely corrupts the linkage between tests and code in the coverage report.
> In principle the best would be to generate a second coverage report for the 
> backwards branch with a separate clover DB. The attached patch does not 
> instrument the bw branch, it only does trunk tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1923) Add toString() or getName() method to IndexReader


 [ 
https://issues.apache.org/jira/browse/LUCENE-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1923.


   Resolution: Fixed
Fix Version/s: 3.1

Thanks Tim!

> Add toString() or getName() method to IndexReader
> -
>
> Key: LUCENE-1923
> URL: https://issues.apache.org/jira/browse/LUCENE-1923
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Index
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 3.1
>
> Attachments: LUCENE-1923.patch, LUCENE-1923.patch
>
>
> It would be very useful for debugging if IndexReader either had a getName() 
> method, or a toString() implementation that would get a string identification 
> for the reader.
> for SegmentReader, this would return the same as getSegmentName()
> for Directory readers, this would return the "generation id"?
> for MultiReader, this could return something like "multi(sub reader name, sub 
> reader name, sub reader name, ...)
> right now, i have to check instanceof for SegmentReader, then call 
> getSegmentName(), and for all other IndexReader types, i would have to do 
> something like get the IndexCommit and get the generation off it (and this 
> may throw UnsupportedOperationException, at which point i have would have to 
> recursively walk sub readers and try again)
> I could work up a patch if others like this idea

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2148) Improve Spatial Point2D and Rectangle Classes


 [ 
https://issues.apache.org/jira/browse/LUCENE-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-2148:
---

Attachment: LUCENE-2148.patch

Patch differs considerably from previous one in that it makes no deletes or 
renames to the existing code.  Instead it contains the following changes:

- Deprecates all classes in geometry.shape
- Adds simpler immutable Point class to geometry package.  This will be used 
instead of Point2D in remaining work
- Adds simpler LatLngRectangle class to geometry package.  This will be used 
instead of LLRect in remaining work
- Deprecates geometry.CartesianPoint
- Changes LatLng to a concrete class (instead of abstract), deprecates most of 
its methods and cleans up the ones that should remain
- Deprecates FloatLatLng and FixedLatLng since they are replaced by the logic 
in LatLng.

> Improve Spatial Point2D and Rectangle Classes
> -
>
> Key: LUCENE-2148
> URL: https://issues.apache.org/jira/browse/LUCENE-2148
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/spatial
>Affects Versions: 3.1
>Reporter: Chris Male
>Assignee: Simon Willnauer
> Attachments: LUCENE-2148.patch, LUCENE-2148.patch
>
>
> The Point2D and Rectangle classes have alot of duplicate, redundant and used 
> functionality.  This issue cleans them both up and simplifies the 
> functionality they provide.
> Subsequent to this, Eclipse and LineSegment, which depend on Point2D, are not 
> used anywhere in the contrib, therefore rather than trying to update them to 
> use the improved Point2D, they will be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2151) Abstract the distance calculation process in the Spatial contrib


 [ 
https://issues.apache.org/jira/browse/LUCENE-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-2151:
---

Attachment: LUCENE-2151.patch

Updated patch so that it uses the DistanceUnits.convert instance method.

> Abstract the distance calculation process in the Spatial contrib
> 
>
> Key: LUCENE-2151
> URL: https://issues.apache.org/jira/browse/LUCENE-2151
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/spatial
>Affects Versions: 3.1
>Reporter: Chris Male
>Assignee: Simon Willnauer
> Attachments: LUCENE-2151.patch, LUCENE-2151.patch
>
>
> The spatial contrib shouldn't tie users to one particular way of calculating 
> distances.  Wikipedia lists multiple different formulas for the great-circle 
> distance calculation, and there are alternatives to that as well.  In a 
> situation where many documents have the same points, it would be useful to be 
> able to cache some calculated values as well (currently this is sort of 
> handled in the filtering process itself).  
> This issue addresses this by abstracting away the distance calculator, 
> allowing the user to provide the implementation of choice.  It would then be 
> possible to swap in different distance calculation strategies without 
> altering the distance filtering process itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Reopened: (LUCENE-1923) Add toString() or getName() method to IndexReader


 [ 
https://issues.apache.org/jira/browse/LUCENE-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-1923:



A few things still to fix a RODirReader doesn't toString right; the SR's 
should add a * (eg) if they have pending changes; the SR's should also reflect 
accurate delCount when they have pending deletes.  I'll work on this...

> Add toString() or getName() method to IndexReader
> -
>
> Key: LUCENE-1923
> URL: https://issues.apache.org/jira/browse/LUCENE-1923
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Index
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 3.1
>
> Attachments: LUCENE-1923.patch, LUCENE-1923.patch
>
>
> It would be very useful for debugging if IndexReader either had a getName() 
> method, or a toString() implementation that would get a string identification 
> for the reader.
> for SegmentReader, this would return the same as getSegmentName()
> for Directory readers, this would return the "generation id"?
> for MultiReader, this could return something like "multi(sub reader name, sub 
> reader name, sub reader name, ...)
> right now, i have to check instanceof for SegmentReader, then call 
> getSegmentName(), and for all other IndexReader types, i would have to do 
> something like get the IndexCommit and get the generation off it (and this 
> may throw UnsupportedOperationException, at which point i have would have to 
> recursively walk sub readers and try again)
> I could work up a patch if others like this idea

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2152) Abstract Spatial distance filtering process and supported field formats


 [ 
https://issues.apache.org/jira/browse/LUCENE-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-2152:
---

Attachment: LUCENE-2152.patch

Updated patch so that it uses Point rather than Point2D.

> Abstract Spatial distance filtering process and supported field formats
> ---
>
> Key: LUCENE-2152
> URL: https://issues.apache.org/jira/browse/LUCENE-2152
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/spatial
>Affects Versions: 3.1
>Reporter: Chris Male
> Attachments: LUCENE-2152.patch, LUCENE-2152.patch
>
>
> Currently the second stage of the filtering process in the spatial contrib 
> involves calculating the exact distance for the remaining documents, and 
> filtering out those that fall out of the search radius.  Currently this is 
> done through the 2 impls of DistanceFilter, LatLngDistanceFilter and 
> GeoHashDistanceFilter.  The main difference between these 2 impls is the 
> format of data they support, the former supporting lat/lngs being stored in 2 
> distinct fields, while the latter supports geohashed lat/lngs through the 
> GeoHashUtils.  This difference should be abstracted out so that the distance 
> filtering process is data format agnostic.
> The second issue is that the distance filtering algorithm can be considerably 
> optimized by using multiple-threads.  Therefore it makes sense to have an 
> abstraction of DistanceFilter which has different implementations, one being 
> a multi-threaded implementation and the other being a blank implementation 
> that can be used when no distance filtering is to occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2149) Simplify Spatial LatLng and LLRect classes


[ 
https://issues.apache.org/jira/browse/LUCENE-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792475#action_12792475
 ] 

Chris Male commented on LUCENE-2149:


This functionality has been incorporated into LUCENE-2148 because of 
dependencies between classes

> Simplify Spatial LatLng and LLRect classes
> --
>
> Key: LUCENE-2149
> URL: https://issues.apache.org/jira/browse/LUCENE-2149
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/spatial
>Affects Versions: 3.1
>Reporter: Chris Male
>Assignee: Simon Willnauer
> Attachments: LUCENE-2149.patch
>
>
> Currently in the contrib there is FloatLatLng, and FixedLatLng, which both 
> extend LatLng.  The reason for this separation is not clear and is not needed 
> in the current functionality.  The functionality that is used can be 
> collapsed into LatLng, which can be made a concrete class.  Internally LatLng 
> can benefit from the improvements suggested in LUCENE-1934.
> LLRect, which uses LatLng, can also be simplified by removing the unused 
> functionality, and using the new LatLng class.
> All classes can be improved through documentation, some method renaming, and 
> general code tidy up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2148) Improve Spatial Point2D and Rectangle Classes


 [ 
https://issues.apache.org/jira/browse/LUCENE-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-2148:
---

Attachment: LUCENE-2148.patch

Added full deprecation annotations

> Improve Spatial Point2D and Rectangle Classes
> -
>
> Key: LUCENE-2148
> URL: https://issues.apache.org/jira/browse/LUCENE-2148
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/spatial
>Affects Versions: 3.1
>Reporter: Chris Male
>Assignee: Simon Willnauer
> Attachments: LUCENE-2148.patch, LUCENE-2148.patch, LUCENE-2148.patch
>
>
> The Point2D and Rectangle classes have alot of duplicate, redundant and used 
> functionality.  This issue cleans them both up and simplifies the 
> functionality they provide.
> Subsequent to this, Eclipse and LineSegment, which depend on Point2D, are not 
> used anywhere in the contrib, therefore rather than trying to update them to 
> use the improved Point2D, they will be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2152) Abstract Spatial distance filtering process and supported field formats


 [ 
https://issues.apache.org/jira/browse/LUCENE-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Male updated LUCENE-2152:
---

Attachment: LUCENE-2152.patch

Added full deprecation annotations

> Abstract Spatial distance filtering process and supported field formats
> ---
>
> Key: LUCENE-2152
> URL: https://issues.apache.org/jira/browse/LUCENE-2152
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/spatial
>Affects Versions: 3.1
>Reporter: Chris Male
> Attachments: LUCENE-2152.patch, LUCENE-2152.patch, LUCENE-2152.patch
>
>
> Currently the second stage of the filtering process in the spatial contrib 
> involves calculating the exact distance for the remaining documents, and 
> filtering out those that fall out of the search radius.  Currently this is 
> done through the 2 impls of DistanceFilter, LatLngDistanceFilter and 
> GeoHashDistanceFilter.  The main difference between these 2 impls is the 
> format of data they support, the former supporting lat/lngs being stored in 2 
> distinct fields, while the latter supports geohashed lat/lngs through the 
> GeoHashUtils.  This difference should be abstracted out so that the distance 
> filtering process is data format agnostic.
> The second issue is that the distance filtering algorithm can be considerably 
> optimized by using multiple-threads.  Therefore it makes sense to have an 
> abstraction of DistanceFilter which has different implementations, one being 
> a multi-threaded implementation and the other being a blank implementation 
> that can be used when no distance filtering is to occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1923) Add toString() or getName() method to IndexReader


 [ 
https://issues.apache.org/jira/browse/LUCENE-1923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1923:
---

Attachment: LUCENE-1923.patch

New patch fixing above issues...

> Add toString() or getName() method to IndexReader
> -
>
> Key: LUCENE-1923
> URL: https://issues.apache.org/jira/browse/LUCENE-1923
> Project: Lucene - Java
>  Issue Type: Wish
>  Components: Index
>Reporter: Tim Smith
>Assignee: Michael McCandless
> Fix For: 3.1
>
> Attachments: LUCENE-1923.patch, LUCENE-1923.patch, LUCENE-1923.patch
>
>
> It would be very useful for debugging if IndexReader either had a getName() 
> method, or a toString() implementation that would get a string identification 
> for the reader.
> for SegmentReader, this would return the same as getSegmentName()
> for Directory readers, this would return the "generation id"?
> for MultiReader, this could return something like "multi(sub reader name, sub 
> reader name, sub reader name, ...)
> right now, i have to check instanceof for SegmentReader, then call 
> getSegmentName(), and for all other IndexReader types, i would have to do 
> something like get the IndexCommit and get the generation off it (and this 
> may throw UnsupportedOperationException, at which point i have would have to 
> recursively walk sub readers and try again)
> I could work up a patch if others like this idea

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2164) Make CMS smarter about thread priorities


 [ 
https://issues.apache.org/jira/browse/LUCENE-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2164:
---

Attachment: LUCENE-2164.patch

Attached patch:

  * Adds new CMS.setMaxMergeCount, which must be greater than
maxThreadCount, allowing CMS to pause big threads so small threads
can finish

  * Uses dynamic default for CMS.maxThreadCount, to between 1 & 3
depending on number of cores; defaults maxMergeCount to that
number +2

The pausing works well the NRT stress test -- it greatly reduces how
often the NRT reopen is blocked because of too many merges running.
It's most important when there is a very big merge running -- in that
case it's better to pause & unpause that big merge when tiny merges
arrive then to force NRT to wait for the completion of the merge.


> Make CMS smarter about thread priorities
> 
>
> Key: LUCENE-2164
> URL: https://issues.apache.org/jira/browse/LUCENE-2164
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2164.patch, LUCENE-2164.patch
>
>
> Spinoff from LUCENE-2161...
> The hard throttling CMS does (blocking the incoming thread that wants
> to launch a new merge) can be devastating when it strikes during NRT
> reopen.
> It can easily happen if a huge merge is off and running, but then a
> tiny merge is needed to clean up recently created segments due to
> frequent reopens.
> I think a small change to CMS, whereby it assigns a higher thread
> priority to tiny merges than big merges, should allow us to increase
> the max merge thread count again, and greatly reduce the chance that
> NRT's reopen would hit this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Hudson build is back to normal: Lucene-trunk #1035

2009-12-18 Thread Apache Hudson Server

See 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2034) Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors


[ 
https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792550#action_12792550
 ] 

Robert Muir commented on LUCENE-2034:
-

Simon, thanks for the update, I like it.

I am going on vacation in a few weeks... so I can say, I will commit next year 
if no one objects :)

> Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors
> -
>
> Key: LUCENE-2034
> URL: https://issues.apache.org/jira/browse/LUCENE-2034
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.9
>Reporter: Simon Willnauer
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2034,patch, LUCENE-2034,patch, LUCENE-2034.patch, 
> LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, 
> LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, 
> LUCENE-2034.txt
>
>
> Due to the variouse tokenStream APIs we had in lucene analyzer subclasses 
> need to implement at least one of the methodes returning a tokenStream. When 
> you look at the code it appears to be almost identical if both are 
> implemented in the same analyzer.  Each analyzer defnes the same inner class 
> (SavedStreams) which is unnecessary.
> In contrib almost every analyzer uses stopwords and each of them creates his 
> own way of loading them or defines a large number of ctors to load stopwords 
> from a file, set, arrays etc.. those ctors should be removed / deprecated and 
> eventually removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2171) Over synchronization for read-only index readers in SegmentTermDocs

2009-12-18 Thread Jayson Minard (JIRA)

Over synchronization for read-only index readers in SegmentTermDocs
---

 Key: LUCENE-2171
 URL: https://issues.apache.org/jira/browse/LUCENE-2171
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 3.0, 2.9.1
Reporter: Jayson Minard
Priority: Minor


In SegmentTermDocs constructor (from 2.9.1)

{code}
46protected SegmentTermDocs(SegmentReader parent) {
47  this.parent = parent;
48  this.freqStream = (IndexInput) parent.core.freqStream.clone();
49  synchronized (parent) {
50this.deletedDocs = parent.deletedDocs;
51  }
52  this.skipInterval = parent.core.getTermsReader().getSkipInterval();
53  this.maxSkipLevels = 
parent.core.getTermsReader().getMaxSkipLevels();
54}
{code}

The synchronization on "parent" for accessing deletedDocs is unnecessary on 
readonly indexes.  If that access was moved into the SegmentReader then it 
could be protected there by default and overridden in ReadonlySegmentReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2080) Improve the documentation of Version


 [ 
https://issues.apache.org/jira/browse/LUCENE-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2080.
-

Resolution: Fixed

> Improve the documentation of Version
> 
>
> Key: LUCENE-2080
> URL: https://issues.apache.org/jira/browse/LUCENE-2080
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 2.9.2, 3.1, 3.0
>
> Attachments: LUCENE-2080.patch
>
>
> In my opinion, we should elaborate more on the effects of changing the 
> Version parameter.
> Particularly, changing this value, even if you recompile your code, likely 
> involves reindexing your data.
> I do not think this is adequately clear from the current javadocs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2171) Over synchronization for read-only index readers in SegmentTermDocs


[ 
https://issues.apache.org/jira/browse/LUCENE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792561#action_12792561
 ] 

Michael McCandless commented on LUCENE-2171:


Super -- wanna whip up a patch?

> Over synchronization for read-only index readers in SegmentTermDocs
> ---
>
> Key: LUCENE-2171
> URL: https://issues.apache.org/jira/browse/LUCENE-2171
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Jayson Minard
>Priority: Minor
>
> In SegmentTermDocs constructor (from 2.9.1)
> {code}
> 46  protected SegmentTermDocs(SegmentReader parent) {
> 47this.parent = parent;
> 48this.freqStream = (IndexInput) parent.core.freqStream.clone();
> 49synchronized (parent) {
> 50  this.deletedDocs = parent.deletedDocs;
> 51}
> 52this.skipInterval = parent.core.getTermsReader().getSkipInterval();
> 53this.maxSkipLevels = 
> parent.core.getTermsReader().getMaxSkipLevels();
> 54  }
> {code}
> The synchronization on "parent" for accessing deletedDocs is unnecessary on 
> readonly indexes.  If that access was moved into the SegmentReader then it 
> could be protected there by default and overridden in ReadonlySegmentReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2171) Over synchronization for read-only index readers in SegmentTermDocs


 [ 
https://issues.apache.org/jira/browse/LUCENE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2171:
---

Fix Version/s: 3.1

> Over synchronization for read-only index readers in SegmentTermDocs
> ---
>
> Key: LUCENE-2171
> URL: https://issues.apache.org/jira/browse/LUCENE-2171
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Jayson Minard
>Priority: Minor
> Fix For: 3.1
>
>
> In SegmentTermDocs constructor (from 2.9.1)
> {code}
> 46  protected SegmentTermDocs(SegmentReader parent) {
> 47this.parent = parent;
> 48this.freqStream = (IndexInput) parent.core.freqStream.clone();
> 49synchronized (parent) {
> 50  this.deletedDocs = parent.deletedDocs;
> 51}
> 52this.skipInterval = parent.core.getTermsReader().getSkipInterval();
> 53this.maxSkipLevels = 
> parent.core.getTermsReader().getMaxSkipLevels();
> 54  }
> {code}
> The synchronization on "parent" for accessing deletedDocs is unnecessary on 
> readonly indexes.  If that access was moved into the SegmentReader then it 
> could be protected there by default and overridden in ReadonlySegmentReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1786) improve performance of contrib/TestCompoundWordTokenFilter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1786:


Lucene Fields: [New, Patch Available]  (was: [New])
Fix Version/s: 3.1

> improve performance of contrib/TestCompoundWordTokenFilter
> --
>
> Key: LUCENE-1786
> URL: https://issues.apache.org/jira/browse/LUCENE-1786
> Project: Lucene - Java
>  Issue Type: Test
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1786.patch
>
>
> contrib/analyzers/compound has some tests that use a hyphenation grammar file.
> The tests are currently for german, and they actually are nice, they show how 
> the combination of the hyphenation rules and dictionary work in tandem.
> The issue is that the german grammar file is not apache licensed: 
> http://offo.sourceforge.net/hyphenation/licenses.html
> So the test must download the entire offo zip file from sourceforge to 
> execute.
> I happen to think the test is a great example of how this thing works (with a 
> language where it matters), but we could consider using a different grammar 
> file, for a language that is apache licensed.
> This way it could be included in the source with the test and would be more 
> practical.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1786) improve performance of contrib/TestCompoundWordTokenFilter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1786:


Attachment: LUCENE-1786.patch

attached patch uses the apache2 licensed danish hyphenation dictionary for 
testing compounds instead.


> improve performance of contrib/TestCompoundWordTokenFilter
> --
>
> Key: LUCENE-1786
> URL: https://issues.apache.org/jira/browse/LUCENE-1786
> Project: Lucene - Java
>  Issue Type: Test
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1786.patch
>
>
> contrib/analyzers/compound has some tests that use a hyphenation grammar file.
> The tests are currently for german, and they actually are nice, they show how 
> the combination of the hyphenation rules and dictionary work in tandem.
> The issue is that the german grammar file is not apache licensed: 
> http://offo.sourceforge.net/hyphenation/licenses.html
> So the test must download the entire offo zip file from sourceforge to 
> execute.
> I happen to think the test is a great example of how this thing works (with a 
> language where it matters), but we could consider using a different grammar 
> file, for a language that is apache licensed.
> This way it could be included in the source with the test and would be more 
> practical.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-1786) improve performance of contrib/TestCompoundWordTokenFilter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-1786:
---

Assignee: Robert Muir

> improve performance of contrib/TestCompoundWordTokenFilter
> --
>
> Key: LUCENE-1786
> URL: https://issues.apache.org/jira/browse/LUCENE-1786
> Project: Lucene - Java
>  Issue Type: Test
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1786.patch
>
>
> contrib/analyzers/compound has some tests that use a hyphenation grammar file.
> The tests are currently for german, and they actually are nice, they show how 
> the combination of the hyphenation rules and dictionary work in tandem.
> The issue is that the german grammar file is not apache licensed: 
> http://offo.sourceforge.net/hyphenation/licenses.html
> So the test must download the entire offo zip file from sourceforge to 
> execute.
> I happen to think the test is a great example of how this thing works (with a 
> language where it matters), but we could consider using a different grammar 
> file, for a language that is apache licensed.
> This way it could be included in the source with the test and would be more 
> practical.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1786) improve performance of contrib/TestCompoundWordTokenFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792592#action_12792592
 ] 

Robert Muir commented on LUCENE-1786:
-

if there is no objection, I will commit shortly.

> improve performance of contrib/TestCompoundWordTokenFilter
> --
>
> Key: LUCENE-1786
> URL: https://issues.apache.org/jira/browse/LUCENE-1786
> Project: Lucene - Java
>  Issue Type: Test
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1786.patch
>
>
> contrib/analyzers/compound has some tests that use a hyphenation grammar file.
> The tests are currently for german, and they actually are nice, they show how 
> the combination of the hyphenation rules and dictionary work in tandem.
> The issue is that the german grammar file is not apache licensed: 
> http://offo.sourceforge.net/hyphenation/licenses.html
> So the test must download the entire offo zip file from sourceforge to 
> execute.
> I happen to think the test is a great example of how this thing works (with a 
> language where it matters), but we could consider using a different grammar 
> file, for a language that is apache licensed.
> This way it could be included in the source with the test and would be more 
> practical.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1786) improve performance of contrib/TestCompoundWordTokenFilter


[ 
https://issues.apache.org/jira/browse/LUCENE-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792596#action_12792596
 ] 

Michael McCandless commented on LUCENE-1786:


Patch looks good!  (Except, my Danish is rusty...).

This test is now wicked fast:
{code}
[junit] Testsuite: 
org.apache.lucene.analysis.compound.TestCompoundWordTokenFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.322 sec
{code}

> improve performance of contrib/TestCompoundWordTokenFilter
> --
>
> Key: LUCENE-1786
> URL: https://issues.apache.org/jira/browse/LUCENE-1786
> Project: Lucene - Java
>  Issue Type: Test
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1786.patch
>
>
> contrib/analyzers/compound has some tests that use a hyphenation grammar file.
> The tests are currently for german, and they actually are nice, they show how 
> the combination of the hyphenation rules and dictionary work in tandem.
> The issue is that the german grammar file is not apache licensed: 
> http://offo.sourceforge.net/hyphenation/licenses.html
> So the test must download the entire offo zip file from sourceforge to 
> execute.
> I happen to think the test is a great example of how this thing works (with a 
> language where it matters), but we could consider using a different grammar 
> file, for a language that is apache licensed.
> This way it could be included in the source with the test and would be more 
> practical.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-1786) improve performance of contrib/TestCompoundWordTokenFilter


 [ 
https://issues.apache.org/jira/browse/LUCENE-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-1786.
-

Resolution: Fixed

Committed revision 892355.

> improve performance of contrib/TestCompoundWordTokenFilter
> --
>
> Key: LUCENE-1786
> URL: https://issues.apache.org/jira/browse/LUCENE-1786
> Project: Lucene - Java
>  Issue Type: Test
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1786.patch
>
>
> contrib/analyzers/compound has some tests that use a hyphenation grammar file.
> The tests are currently for german, and they actually are nice, they show how 
> the combination of the hyphenation rules and dictionary work in tandem.
> The issue is that the german grammar file is not apache licensed: 
> http://offo.sourceforge.net/hyphenation/licenses.html
> So the test must download the entire offo zip file from sourceforge to 
> execute.
> I happen to think the test is a great example of how this thing works (with a 
> language where it matters), but we could consider using a different grammar 
> file, for a language that is apache licensed.
> This way it could be included in the source with the test and would be more 
> practical.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2026) Refactoring of IndexWriter

2009-12-18 Thread Marvin Humphrey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792625#action_12792625
 ] 

Marvin Humphrey commented on LUCENE-2026:
-

> Well, autoCommit just means "periodically call commit". So, if you
> decide to offer a commit() operation, then autoCommit would just wrap
> that? But, I don't think autoCommit should be offered... app should
> decide.

Agreed, autoCommit had benefits under legacy Lucene, but wouldn't be important
now.  If we did add some sort of "automatic commit" feature, it would mean
something else: commit every change instantly.  But that's easy to implement
via a wrapper, so there's no point cluttering the the primary index writer
class to support such a feature.

> Again: NRT is not a "specialized reader". It's a normal read-only
> DirectoryReader, just like you'd get from IndexReader.open, with the
> only difference being that it consulted IW to find which segments to
> open. Plus, it's pooled, so that if IW already has a given segment
> reader open (say because deletes were applied or merges are running),
> it's reused.

Well, it seems to me that those two features make it special -- particularly
the pooling of SegmentReaders.  You can't take advantage of that outside the
context of IndexWriter:

> Yes, Lucene's approach must be in the same JVM. But we get important
> gains from this - reusing a single reader (the pool), carrying over
> merged deletions directly in RAM (and eventually field cache & norms
> too - LUCENE-1785).

Exactly.  In my view, that's what makes that reader "special": unlike ordinary
Lucene IndexReaders, this one springs into being with its caches already
primed rather than in need of lazy loading.

But to achieve those benefits, you have to mod the index writing process.
Those modifications are not necessary under the Lucy model, because the mere
act of writing the index stores our data in the system IO cache.

> Instead, Lucy (by design) must do all sharing & access all index data
> through the filesystem (a decision, I think, could be dangerous),
> which will necessarily increase your reopen time. 

Dangerous in what sense?

Going through the file system is a tradeoff, sure -- but it's pretty nice to
design your low-latency search app free from any concern about whether
indexing and search need to be coordinated within a single process.
Furthermore, if separate processes are your primary concurrency model, going
through the file system is actually mandatory to achieve best performance on a
multi-core box.  Lucy won't always be used with multi-threaded hosts.

I actually think going through the file system is dangerous in a different
sense: it puts pressure on the file format spec.  The easy way to achieve IPC
between writers and readers will be to dump stuff into one of the JSON files
to support the killer-feature-du-jour -- such as what I'm proposing with this
"fsync" key in the snapshot file.  But then we wind up with a bunch of crap
cluttering up our index metadata files.  I'm determined that Lucy will have a
more coherent file format than Lucene, but with this IPC requirement we're
setting our community up to push us in the wrong direction.  If we're not
careful, we could end up with a file format that's an unmaintainable jumble.

But you're talking performance, not complexity costs, right?

> Maybe in practice that cost is small though... the OS write cache should
> keep everything fresh... but you still must serialize.

Anecdotally, at Eventful one of our indexes is 5 GB with 16 million records
and 900 MB worth of sort cache data; opening a fresh searcher and loading all
sort caches takes circa 21 ms.

There's room to improve that further -- we haven't yet implemented
IndexReader.reopen() -- but that was fast enough to achieve what we wanted to
achieve.

> Refactoring of IndexWriter
> --
>
> Key: LUCENE-2026
> URL: https://issues.apache.org/jira/browse/LUCENE-2026
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> I've been thinking for a while about refactoring the IndexWriter into
> two main components.
> One could be called a SegmentWriter and as the
> name says its job would be to write one particular index segment. The
> default one just as today will provide methods to add documents and
> flushes when its buffer is full.
> Other SegmentWriter implementations would do things like e.g. appending or
> copying external segments [what addIndexes*() currently does].
> The second component's job would it be to manage writing the segments
> file and merging/deleting segments. It would know about
> DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
> provide hooks that allow user

[jira] Commented: (LUCENE-2026) Refactoring of IndexWriter

2009-12-18 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792629#action_12792629
 ] 

Jason Rutherglen commented on LUCENE-2026:
--

{quote}Anecdotally, at Eventful one of our indexes is 5 GB with 16 million 
records
and 900 MB worth of sort cache data; opening a fresh searcher and loading all
sort caches takes circa 21 ms.{quote}

Marvin, very cool!  Are you using the mmap module you mentioned at ApacheCon?

> Refactoring of IndexWriter
> --
>
> Key: LUCENE-2026
> URL: https://issues.apache.org/jira/browse/LUCENE-2026
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> I've been thinking for a while about refactoring the IndexWriter into
> two main components.
> One could be called a SegmentWriter and as the
> name says its job would be to write one particular index segment. The
> default one just as today will provide methods to add documents and
> flushes when its buffer is full.
> Other SegmentWriter implementations would do things like e.g. appending or
> copying external segments [what addIndexes*() currently does].
> The second component's job would it be to manage writing the segments
> file and merging/deleting segments. It would know about
> DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
> provide hooks that allow users to manage external data structures and
> keep them in sync with Lucene's data during segment merges.
> API wise there are things we have to figure out, such as where the
> updateDocument() method would fit in, because its deletion part
> affects all segments, whereas the new document is only being added to
> the new segment.
> Of course these should be lower level APIs for things like parallel
> indexing and related use cases. That's why we should still provide
> easy to use APIs like today for people who don't need to care about
> per-segment ops during indexing. So the current IndexWriter could
> probably keeps most of its APIs and delegate to the new classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2171) Over synchronization for read-only index readers in SegmentTermDocs

2009-12-18 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792637#action_12792637
 ] 

Earwin Burrfoot commented on LUCENE-2171:
-

(without looking deep) I have a feeling that for RW Reader _synchronized_ is 
also unnecessary - _volatile_ will suffice.

> Over synchronization for read-only index readers in SegmentTermDocs
> ---
>
> Key: LUCENE-2171
> URL: https://issues.apache.org/jira/browse/LUCENE-2171
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9.1, 3.0
>Reporter: Jayson Minard
>Priority: Minor
> Fix For: 3.1
>
>
> In SegmentTermDocs constructor (from 2.9.1)
> {code}
> 46  protected SegmentTermDocs(SegmentReader parent) {
> 47this.parent = parent;
> 48this.freqStream = (IndexInput) parent.core.freqStream.clone();
> 49synchronized (parent) {
> 50  this.deletedDocs = parent.deletedDocs;
> 51}
> 52this.skipInterval = parent.core.getTermsReader().getSkipInterval();
> 53this.maxSkipLevels = 
> parent.core.getTermsReader().getMaxSkipLevels();
> 54  }
> {code}
> The synchronization on "parent" for accessing deletedDocs is unnecessary on 
> readonly indexes.  If that access was moved into the SegmentReader then it 
> could be protected there by default and overridden in ReadonlySegmentReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2026) Refactoring of IndexWriter

2009-12-18 Thread Marvin Humphrey (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792638#action_12792638
 ] 

Marvin Humphrey commented on LUCENE-2026:
-

Yes, this is using the sort cache model worked out this spring on lucy-dev.
The memory mapping happens within FSFileHandle (LUCY-83). SortWriter 
and SortReader haven't made it into the Lucy repository yet.

> Refactoring of IndexWriter
> --
>
> Key: LUCENE-2026
> URL: https://issues.apache.org/jira/browse/LUCENE-2026
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: 3.1
>
>
> I've been thinking for a while about refactoring the IndexWriter into
> two main components.
> One could be called a SegmentWriter and as the
> name says its job would be to write one particular index segment. The
> default one just as today will provide methods to add documents and
> flushes when its buffer is full.
> Other SegmentWriter implementations would do things like e.g. appending or
> copying external segments [what addIndexes*() currently does].
> The second component's job would it be to manage writing the segments
> file and merging/deleting segments. It would know about
> DeletionPolicy, MergePolicy and MergeScheduler. Ideally it would
> provide hooks that allow users to manage external data structures and
> keep them in sync with Lucene's data during segment merges.
> API wise there are things we have to figure out, such as where the
> updateDocument() method would fit in, because its deletion part
> affects all segments, whereas the new document is only being added to
> the new segment.
> Of course these should be lower level APIs for things like parallel
> indexing and related use cases. That's why we should still provide
> easy to use APIs like today for people who don't need to care about
> per-segment ops during indexing. So the current IndexWriter could
> probably keeps most of its APIs and delegate to the new classes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2026) Refactoring of IndexWriter

[
https://issues.apache.org/jira/browse/LUCENE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792713#action_12792713
]

Michael McCandless commented on LUCENE-2026:

{quote}
bq. Again: NRT is not a "specialized reader". It's a normal read-only
DirectoryReader, just like you'd get from IndexReader.open, with the only
difference being that it consulted IW to find which segments to open. Plus,
it's pooled, so that if IW already has a given segment reader open (say because
deletes were applied or merges are running), it's reused.

Well, it seems to me that those two features make it special - particularly
the pooling of SegmentReaders. You can't take advantage of that outside the
context of IndexWriter:
{quote}

OK so mabye a little special ;) But, really that pooling should be
factored out of IW. It's not writer specific.

{quote}
bq. Yes, Lucene's approach must be in the same JVM. But we get important gains
from this - reusing a single reader (the pool), carrying over merged deletions
directly in RAM (and eventually field cache & norms too - LUCENE-1785).

Exactly. In my view, that's what makes that reader "special": unlike ordinary
Lucene IndexReaders, this one springs into being with its caches already
primed rather than in need of lazy loading.

But to achieve those benefits, you have to mod the index writing process.
{quote}

Mod the index writing, and the reader reopen, to use the shared pool.
The pool in itself isn't writer specific.

Really the pool is just like what you tap into when you call reopen --
that method looks at the current "pool" of already opened segments,
sharing what it can.

bq. Those modifications are not necessary under the Lucy model, because the
mere act of writing the index stores our data in the system IO cache.

But, that's where Lucy presumably takes a perf hit. Lucene can share
these in RAM, not usign the filesystem as the intermediary (eg we do
that today with deletions; norms/field cache/eventual CSF can do the
same.) Lucy must go through the filesystem to share.

{quote}
bq. Instead, Lucy (by design) must do all sharing & access all index data
through the filesystem (a decision, I think, could be dangerous), which will
necessarily increase your reopen time.

Dangerous in what sense?

Going through the file system is a tradeoff, sure - but it's pretty nice to
design your low-latency search app free from any concern about whether
indexing and search need to be coordinated within a single process.
Furthermore, if separate processes are your primary concurrency model, going
through the file system is actually mandatory to achieve best performance on a
multi-core box. Lucy won't always be used with multi-threaded hosts.

I actually think going through the file system is dangerous in a different
sense: it puts pressure on the file format spec. The easy way to achieve IPC
between writers and readers will be to dump stuff into one of the JSON files
to support the killer-feature-du-jour - such as what I'm proposing with this
"fsync" key in the snapshot file. But then we wind up with a bunch of crap
cluttering up our index metadata files. I'm determined that Lucy will have a
more coherent file format than Lucene, but with this IPC requirement we're
setting our community up to push us in the wrong direction. If we're not
careful, we could end up with a file format that's an unmaintainable jumble.

But you're talking performance, not complexity costs, right?
{quote}

Mostly I was thinking performance, ie, trusting the OS to make good
decisions about what should be RAM resident, when it has limited
information...

But, also risky is that all important data structures must be
"file-flat", though in practice that doesn't seem like an issue so
far? The RAM resident things Lucene has -- norms, deleted docs, terms
index, field cache -- seem to "cast" just fine to file-flat. If we
switched to an FST for the terms index I guess that could get
tricky...

Wouldn't shared memory be possible for process-only concurrent models?
Also, what popular systems/environments have this requirement (only
process level concurrency) today?

It's wonderful that Lucy can startup really fast, but, for most apps
that's not nearly as important as searching/indexing performance,
right? I mean, you start only once, and then you handle many, many
searches / index many documents, with that process, usually?

{quote}
bq. Maybe in practice that cost is small though... the OS write cache should
keep everything fresh... but you still must serialize.

Anecdotally, at Eventful one of our indexes is 5 GB with 16 million records
and 900 MB worth of sort cache data; opening a fresh searcher and loading all
sort caches takes circa 21 ms.
{quote}

That's fabulously fast!

But you really need to also test search/indexing throughput, reopen time
(I think) once that's online for Lucy...

{quote}
There's room to improv

[jira] Commented: (LUCENE-2167) StandardTokenizer Javadoc does not correctly describe tokenization around punctuation characters

2009-12-18 Thread Shyamal Prasad (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792726#action_12792726
 ] 

Shyamal Prasad commented on LUCENE-2167:


Hi Robert, I presume that when you say we should "instead improve standard 
analyzer" you mean the code should work more like the original Javadoc states 
it should? Or are you suggesting that moving to Jflex 1.5 

The problem I observed was that the current JFlex rules don't implement what 
the Javadoc says is the  behavior of the tokenizer. I'd be happy to spend some 
time on this if I could get some direction on where I should focus.

> StandardTokenizer Javadoc does not correctly describe tokenization around 
> punctuation characters
> 
>
> Key: LUCENE-2167
> URL: https://issues.apache.org/jira/browse/LUCENE-2167
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 2.4.1, 2.9, 2.9.1, 3.0
>Reporter: Shyamal Prasad
>Priority: Minor
> Attachments: LUCENE-2167.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The Javadoc for StandardTokenizer states:
> {quote}
> Splits words at punctuation characters, removing punctuation. 
> However, a dot that's not followed by whitespace is considered part of a 
> token.
> Splits words at hyphens, unless there's a number in the token, in which case 
> the whole 
> token is interpreted as a product number and is not split.
> {quote}
> This is not accurate. The actual JFlex implementation treats hyphens 
> interchangeably with
> punctuation. So, for example "video,mp4,test" results in a *single* token and 
> not three tokens
> as the documentation would suggest.
> Additionally, the documentation suggests that "video-mp4-test-again" would 
> become a single
> token, but in reality it results in two tokens: "video-mp4-test" and "again".
> IMHO the parser implementation is fine as is since it is hard to keep 
> everyone happy, but it is probably
> worth cleaning up the documentation string. 
> The patch included here updates the documentation string and adds a few test 
> cases to confirm the cases described above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2167) StandardTokenizer Javadoc does not correctly describe tokenization around punctuation characters