[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-06 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821-SloppyDecays.patch

Patch adds NonExactPhraseScorer (temporary name) as discussed above - work in 
progress, it does not yet do any sloppy matching or scoring.

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821-SloppyDecays.patch, LUCENE-3821.patch, 
 LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821.patch, 
 LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-05 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821.patch

Attached updated patch. 

Repeating PPs with multi-Phrase-query is handled as well.

This called for more cases in the sloppy phrase scorer and more code, and, 
although I think the code is cleaner now, I don't know to what extent is it 
easier to maintain. 

It definitely fixes wrong behavior that exists in current 3x and trunk (patch 
is for 3x).

However, although the random test passes for me even with -Dtests.iter=2000, it 
is possible to break the scorer - that is, create a document and a query 
which should match each other but would not. 

The patch adds just such a case as an @Ignored test case:  
TestMultiPhraseQuery.testMultiSloppyWithRepeats(). 

I don't see how to solve this specific case in the context of current sloppy 
phrase scorer. 

So there are 3 options:
# leave things as they are
# commit this patch and for now document the failing scenario (also keep it in 
the ignored test case). 
# devise a different algorithm for this.

I would love it to be the 3rd if I just knew how to do it. Otherwise I like the 
2nd, just need to keep in mind that the random test might from time to time 
create this scenario and so there will be noise in the test builds.

Preferences?

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821.patch, 
 LUCENE-3821.patch, LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-04 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821.patch

updated patch with fixed MFQ.toString(), which prints the problematic doc and 
queries in case of failure.

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821.patch, 
 LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-03 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821.patch

Patch with fix for this problem. I would expect SloppyPhrase scoring 
performance to degrade, though I did not measure it.

The single test that still fails (and I think the bug is in ExactPhraseScorer) 
is testRandomIncreasingSloppiness, and can be recreated like this:
{noformat}
ant test -Dtestcase=TestSloppyPhraseQuery2 
-Dtestmethod=testRandomIncreasingSloppiness 
-Dtests.seed=47267613db69f714:-617bb800c4a3c645:-456a673444fdc184 
-Dargs=-Dfile.encoding=UTF-8
{noformat}

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821.patch, LUCENE-3821_test.patch, schema.xml, 
 solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-03 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3821:


Attachment: LUCENE-3821.patch

bq. Hmm patch has this: ... import backport.api...

Oops, here's a fixed patch, also added some comments, and removed the @Ignore 
from the test

bq. I'm going to be ecstatic if that crazy test finds bugs both in exact and 
sloppy phrase scorers :)

It is a great test! Interestingly one thing it exposed is the dependency of the 
SloppyPhraseScorer in the order of PPs in PhraseScorer when phraseFreq() is 
invoked. The way things work in the super class, that order depends on the 
content of previously processed documents. This fix removes that wrong 
dependency, of course. The point is that deliberately devising a test that 
exposes such a bug seems almost impossible: first you need to think about such 
a case, and if you did, writing a test that would create this specific scenario 
is buggy by itself. Praise to random testing, and this random test in 
particular.

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821.patch, LUCENE-3821.patch, 
 LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()

2012-02-05 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3746:


Attachment: LUCENE-3746.patch

Updated patch using ManagementFactory.getMemoryMXBean().getHeapMemoryUsage(). 

Javadocs are not explicit about this call being atomic, but from the wording it 
seems almost certain to conclude that each call returns a new Usage instance. 
In this patch this is (Java) asserted and the assert passes (-ea) in two 
different JVMs - IBM and Oracle - so this might be correct. I searched some 
more explicit info on this with no success. 

Annoyingly though, in IBM JDK, running the tests like this produces the nice 
warning:

{noformat}
WARNING: test class left thread running: Thread[MemoryPoolMXBean notification 
dispatcher,6,main]
RESOURCE LEAK: test class left 1 thread(s) running
{noformat}

This makes me reluctant to use the memory bean - I did not find a way to 
prevent that thread leak.

So perhaps a better approach would be to be conservative about the sequence of 
calls when using Runtime? something like this:

{code}
long free = rt.freeMemory();
if (free is sufficient)
  return decideBy(free);
long max = rt.maxMemory();
long total = rt.totalMemory();
return decideBy(max - total)
{code}

This is conservative in that 'total' is computed last, and in that total-free 
is not added to the computed available bytes.

In both approaches, even if atomicity is guaranteed, it is possible that more 
heap is allocated in another thread between the time that the size is computed, 
to the time that the bytes are actually allocated, so not sure how safe this 
check can be made.

 suggest.fst.Sort.BufferSize should not automatically fail just because of 
 freeMemory()
 --

 Key: LUCENE-3746
 URL: https://issues.apache.org/jira/browse/LUCENE-3746
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Reporter: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3746.patch, LUCENE-3746.patch


 Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM 
 buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()

2012-02-05 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3746:


Attachment: LUCENE-3746.patch

Updated patch - without MemoryMXBean - computing 'max, total, free' (in that 
order) and deciding by 'free' or falling to 'max-free'. This is more 
conservative, than MemoryMxBean but since the latter is not full proof either, 
I prefer the simpler approach. 

 suggest.fst.Sort.BufferSize should not automatically fail just because of 
 freeMemory()
 --

 Key: LUCENE-3746
 URL: https://issues.apache.org/jira/browse/LUCENE-3746
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Reporter: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch


 Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM 
 buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()

2012-02-02 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3746:


Attachment: LUCENE-3746.patch

Simple fix: consult also with maxMemory if freeMemory not suffice.

 suggest.fst.Sort.BufferSize should not automatically fail just because of 
 freeMemory()
 --

 Key: LUCENE-3746
 URL: https://issues.apache.org/jira/browse/LUCENE-3746
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Reporter: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3746.patch


 Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM 
 buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)

2012-01-30 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-1812:


Attachment: pruning.patch

Updated patch: package.html and all pruning classes moved to another package, 
except for PruningReader. Now ant javadocs-all passes as well. There are 3 
TODO's:
# implement CarmelTermPruningDeltaTopPolicy
# dead code question in CarmelUniformTermPruningPolicy
# missing details in package.html

The first one can wait but the other two I would like to handle before 
committing.

 Static index pruning by in-document term frequency (Carmel pruning)
 ---

 Key: LUCENE-1812
 URL: https://issues.apache.org/jira/browse/LUCENE-1812
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/other
Reporter: Andrzej Bialecki 
Assignee: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: pruning.patch, pruning.patch, pruning.patch, 
 pruning.patch, pruning.patch, pruning.patch


 This module provides tools to produce a subset of input indexes by removing 
 postings data for those terms where their in-document frequency is below a 
 specified threshold. The net effect of this processing is a much smaller 
 index that for common types of queries returns nearly identical top-N results 
 as compared with the original index, but with increased performance. 
 Optionally, stored values and term vectors can also be removed. This 
 functionality is largely independent, so it can be used without term pruning 
 (when term freq. threshold is set to 1).
 As the threshold value increases, the total size of the index decreases, 
 search performance increases, and recall decreases (i.e. search quality 
 deteriorates). NOTE: especially phrase recall deteriorates significantly at 
 higher threshold values. 
 Primary purpose of this class is to produce small first-tier indexes that fit 
 completely in RAM, and store these indexes using 
 IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class 
 will not be sufficient to use the resulting index view for on-the-fly pruning 
 and searching. 
 NOTE: If the input index is optimized (i.e. doesn't contain deletions) then 
 the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve 
 internal document id-s so that they are in sync with the original index. This 
 means that all other auxiliary information not necessary for first-tier 
 processing, such as some stored fields, can also be removed, to be quickly 
 retrieved on-demand from the original index using the same internal document 
 id. 
 Threshold values can be specified globally (for terms in all fields) using 
 defaultThreshold parameter, and can be overriden using per-field or per-term 
 values supplied in a thresholds map. Keys in this map are either field names, 
 or terms in field:text format. The precedence of these values is the 
 following: first a per-term threshold is used if present, then per-field 
 threshold if present, and finally the default threshold.
 A command-line tool (PruningTool) is provided for convenience. At this moment 
 it doesn't support all functionality available through API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3718) SamplingWrapperTest failure with certain test seed

2012-01-24 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3718:


Attachment: LUCENE-3718.patch

Attached simple fix to Lucene40PostingsReader: linearScan() should set doc also 
when returning refill().

 SamplingWrapperTest failure with certain test seed
 --

 Key: LUCENE-3718
 URL: https://issues.apache.org/jira/browse/LUCENE-3718
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3718.patch


 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12231/
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.facet.search.SamplingWrapperTest.testCountUsingSamping
 Error Message:
 Results are not the same!
 Stack Trace:
 org.apache.lucene.facet.FacetTestBase$NotSameResultError: Results are not the 
 same!
at 
 org.apache.lucene.facet.FacetTestBase.assertSameResults(FacetTestBase.java:333)
at 
 org.apache.lucene.facet.search.sampling.BaseSampleTestTopK.assertSampling(BaseSampleTestTopK.java:104)
at 
 org.apache.lucene.facet.search.sampling.BaseSampleTestTopK.testCountUsingSamping(BaseSampleTestTopK.java:82)
at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
 NOTE: reproduce with: ant test -Dtestcase=SamplingWrapperTest 
 -Dtestmethod=testCountUsingSamping 
 -Dtests.seed=4a5994491f79fc80:-18509d134c89c159:-34f6ecbb32e930f7 
 -Dtests.multiplier=3 -Dargs=-Dfile.encoding=UTF-8
 NOTE: test params are: codec=Lucene40: 
 {$facets=PostingsFormat(name=MockRandom), 
 $full_path$=PostingsFormat(name=MockSep), content=Pulsing40(freqCutoff=19 
 minBlockSize=65 maxBlockSize=209), 
 $payloads$=PostingsFormat(name=Lucene40WithOrds)}, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=true): {$facets=LM 
 Jelinek-Mercer(0.70), content=DFR I(n)B3(800.0)}, locale=bg, 
 timezone=Asia/Manila

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3718) SamplingWrapperTest failure with certain test seed

2012-01-24 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3718:


Attachment: LUCENE-3718.patch

updated patch with same fix also in AllDocsSegmentDocsEnum.linearScan() 
(previous patch fixed only LiveDocsSegmentDocsEnum.linearScan()).

I also verified that this facets test does not fail in 3x with same seed.

 SamplingWrapperTest failure with certain test seed
 --

 Key: LUCENE-3718
 URL: https://issues.apache.org/jira/browse/LUCENE-3718
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3718.patch, LUCENE-3718.patch


 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12231/
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.facet.search.SamplingWrapperTest.testCountUsingSamping
 Error Message:
 Results are not the same!
 Stack Trace:
 org.apache.lucene.facet.FacetTestBase$NotSameResultError: Results are not the 
 same!
at 
 org.apache.lucene.facet.FacetTestBase.assertSameResults(FacetTestBase.java:333)
at 
 org.apache.lucene.facet.search.sampling.BaseSampleTestTopK.assertSampling(BaseSampleTestTopK.java:104)
at 
 org.apache.lucene.facet.search.sampling.BaseSampleTestTopK.testCountUsingSamping(BaseSampleTestTopK.java:82)
at 
 org.apache.lucene.util.LuceneTestCase$3$1.evaluate(LuceneTestCase.java:529)
at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
 org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
 NOTE: reproduce with: ant test -Dtestcase=SamplingWrapperTest 
 -Dtestmethod=testCountUsingSamping 
 -Dtests.seed=4a5994491f79fc80:-18509d134c89c159:-34f6ecbb32e930f7 
 -Dtests.multiplier=3 -Dargs=-Dfile.encoding=UTF-8
 NOTE: test params are: codec=Lucene40: 
 {$facets=PostingsFormat(name=MockRandom), 
 $full_path$=PostingsFormat(name=MockSep), content=Pulsing40(freqCutoff=19 
 minBlockSize=65 maxBlockSize=209), 
 $payloads$=PostingsFormat(name=Lucene40WithOrds)}, 
 sim=RandomSimilarityProvider(queryNorm=true,coord=true): {$facets=LM 
 Jelinek-Mercer(0.70), content=DFR I(n)B3(800.0)}, locale=bg, 
 timezone=Asia/Manila

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)

2012-01-23 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-1812:


Attachment: pruning.patch

Updated patch for current 3x.

 Static index pruning by in-document term frequency (Carmel pruning)
 ---

 Key: LUCENE-1812
 URL: https://issues.apache.org/jira/browse/LUCENE-1812
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/other
Reporter: Andrzej Bialecki 
Assignee: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: pruning.patch, pruning.patch, pruning.patch, 
 pruning.patch, pruning.patch


 This module provides tools to produce a subset of input indexes by removing 
 postings data for those terms where their in-document frequency is below a 
 specified threshold. The net effect of this processing is a much smaller 
 index that for common types of queries returns nearly identical top-N results 
 as compared with the original index, but with increased performance. 
 Optionally, stored values and term vectors can also be removed. This 
 functionality is largely independent, so it can be used without term pruning 
 (when term freq. threshold is set to 1).
 As the threshold value increases, the total size of the index decreases, 
 search performance increases, and recall decreases (i.e. search quality 
 deteriorates). NOTE: especially phrase recall deteriorates significantly at 
 higher threshold values. 
 Primary purpose of this class is to produce small first-tier indexes that fit 
 completely in RAM, and store these indexes using 
 IndexWriter.addIndexes(IndexReader[]). Usually the performance of this class 
 will not be sufficient to use the resulting index view for on-the-fly pruning 
 and searching. 
 NOTE: If the input index is optimized (i.e. doesn't contain deletions) then 
 the index produced via IndexWriter.addIndexes(IndexReader[]) will preserve 
 internal document id-s so that they are in sync with the original index. This 
 means that all other auxiliary information not necessary for first-tier 
 processing, such as some stored fields, can also be removed, to be quickly 
 retrieved on-demand from the original index using the same internal document 
 id. 
 Threshold values can be specified globally (for terms in all fields) using 
 defaultThreshold parameter, and can be overriden using per-field or per-term 
 values supplied in a thresholds map. Keys in this map are either field names, 
 or terms in field:text format. The precedence of these values is the 
 following: first a per-term threshold is used if present, then per-field 
 threshold if present, and finally the default threshold.
 A command-line tool (PruningTool) is provided for convenience. At this moment 
 it doesn't support all functionality available through API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

2011-11-27 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3596:


Attachment: LUCENE-3596.patch

Patch taking approach (1) above, and moving createIWC() to constructor. 

In addition fixed some javadoc comments, and added an assert to the 
constructor, which, only when assertions are enabled, will verify that the IWC 
in effect is not an instance of TieredMergePolicy. Imperfect as this is, it at 
least exposed the problem in current test (fixed to use newLogMP()).

I think this is ready to commit.

 DirectoryTaxonomyWriter extensions should be able to set internal index 
 writer config attributes such as info stream
 

 Key: LUCENE-3596
 URL: https://issues.apache.org/jira/browse/LUCENE-3596
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3596.patch, LUCENE-3596.patch


 Current protected openIndexWriter(Directory directory, OpenMode openMode) 
 does not provide access to the IWC it creates.
 So extensions must reimplement this method completely in order to set e.f. 
 info stream for the internal index writer.
 This came up in [user question: Taxonomy indexer debug 
 |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3596) DirectoryTaxonomyWriter extensions should be able to set internal index writer config attributes such as info stream

2011-11-26 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3596:


Attachment: LUCENE-3596.patch

patch adds the method createIndexWriterConfig(OpenMode openMode) and javadocs 
for in-order segments merging.

 DirectoryTaxonomyWriter extensions should be able to set internal index 
 writer config attributes such as info stream
 

 Key: LUCENE-3596
 URL: https://issues.apache.org/jira/browse/LUCENE-3596
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/facet
Reporter: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3596.patch


 Current protected openIndexWriter(Directory directory, OpenMode openMode) 
 does not provide access to the IWC it creates.
 So extensions must reimplement this method completely in order to set e.f. 
 info stream for the internal index writer.
 This came up in [user question: Taxonomy indexer debug 
 |http://lucene.472066.n3.nabble.com/Taxonomy-indexer-debug-td3533341.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

2011-11-16 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3573:


Attachment: LUCENE-3573.patch

Final patch.

Also updated the user-guide about refresh() behavior.

Removed the changes entry - for facet this would go only into 3x.

Planning to commit this soon.

 TaxonomyReader.refresh() is broken, replace its logic with reopen(), 
 following IR.reopen pattern
 

 Key: LUCENE-3573
 URL: https://issues.apache.org/jira/browse/LUCENE-3573
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3573.patch, LUCENE-3573.patch, LUCENE-3573.patch


 When recreating the taxonomy index, TR's assumption that categories are only 
 added does not hold anymore.
 As result, calling TR.refresh() will be incorrect at best, but usually throw 
 an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

2011-11-15 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3573:


Attachment: LUCENE-3573.patch

Patch, in principle ready to commit, though I plan to go through it once more.

In this patch:
* new tests moved to TestDirectoryTaxonomyReader
* an exception added: InconsistentTaxonomyException
* when the reader cannot refresh because the taxonomy was recreated since the 
last time open/refresh, that exception is thrown and the application should 
open a fresh taxonomy reader.

Bumped into 3 TODO's while working on this:
* FilterIndexReader does not implement getCommitUserData(). Once this is fixed 
can resolvethe TODO in TestIndexClose. I'll open an issue later.
* TR.refresh() should return a boolean indicating anything was changed (issue).
* DTW.rollback() seems wrong to me - it rollback the internal IW, which also 
closes it, but then it refreshes its internal TR, seems wrong...

 TaxonomyReader.refresh() is broken, replace its logic with reopen(), 
 following IR.reopen pattern
 

 Key: LUCENE-3573
 URL: https://issues.apache.org/jira/browse/LUCENE-3573
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3573.patch, LUCENE-3573.patch


 When recreating the taxonomy index, TR's assumption that categories are only 
 added does not hold anymore.
 As result, calling TR.refresh() will be incorrect at best, but usually throw 
 an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3573) TaxonomyReader.refresh() is broken, replace its logic with reopen(), following IR.reopen pattern

2011-11-14 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3573:


Attachment: LUCENE-3573.patch

Attached patch for trunk adds two tests:
* one of them is opening a new TR and passes
* the other is refreshing the TR and fails.

 TaxonomyReader.refresh() is broken, replace its logic with reopen(), 
 following IR.reopen pattern
 

 Key: LUCENE-3573
 URL: https://issues.apache.org/jira/browse/LUCENE-3573
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3573.patch


 When recreating the taxonomy index, TR's assumption that categories are only 
 added does not hold anymore.
 As result, calling TR.refresh() will be incorrect at best, but usually throw 
 an AIOOBE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3506) tests for verifying that assertions are enabled do nothing since they ignore AssertionError

2011-10-25 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3506:


Attachment: LUCENE-3506.patch

Attached fix for this:

- assertionsEnabled() method added to LTC. 

- tests that were no op were fixed to actually test that the assertion failed.

- after the fix, in trunk, analyzer's final'ness assertion tests failed because 
being final (class or method) is no longer needed in trunk. So these tests were 
removed in TestAssertions.
-- note: should not remove these tests when merging to 3x.

- TestSegmentMerger also failed with this fix - because it used the stale IW's 
SegmentInfos to create a compound segment. Fixed by reading a fresh SIS.

- only one test (TestAssertions.testbasics()) fails if assertions are not 
enabled. The other tests do not fail (though they do execute). I think that 
this was intended in the code, thought since it did not work it is hard to 
tell...

This is ready to commit.

 tests for verifying that assertions are enabled do nothing since they ignore 
 AssertionError
 ---

 Key: LUCENE-3506
 URL: https://issues.apache.org/jira/browse/LUCENE-3506
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/test
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3506.patch


 Follow-up from LUCENE-3501

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3506) tests for verifying that assertions are enabled do nothing since they ignore AssertionError

2011-10-25 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3506:


Attachment: LUCENE-3506.patch

Updated patch as suggested, thanks guys for reviewing and your helpful comments.

 tests for verifying that assertions are enabled do nothing since they ignore 
 AssertionError
 ---

 Key: LUCENE-3506
 URL: https://issues.apache.org/jira/browse/LUCENE-3506
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/test
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3506.patch, LUCENE-3506.patch


 Follow-up from LUCENE-3501

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3501) random sampler is not random (and so facet SamplingWrapperTest occasionally fails)

2011-10-09 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3501:


Attachment: LUCENE-3501.patch

Before applying this patch should do:
{noformat}
svn mv modules/facet/src/java/org/apache/lucene/facet/util/RandomSample.java 
modules/facet/src/java/org/apache/lucene/facet/search/sampling/RepeatableSampler.java
{noformat}

I looked at this, and also discussed with Gilad, and it seems that:

* The test is broken as it claims to do N trials in case of failure but it does 
not, because its try/catch does not catch AssertionError, and so only one trial 
is attempted. (Few trials make sense because with sampling, there is always a 
possibility that the selected sample set of docs would not contain the 
correct best facets even with a high over sampling ratio (over sampling means 
that for the selected set of docs more best facets are collected).

* Even after fixing the test to actually try more than once, it still fails, 
because there is no randomness in RandomSample...  surprising but true.

In this patch:
* Sampler made an abstract class.
* RandomSample renamed to RepeatableSampler which extends RandomSampler.
* RandomSampler was added - it too extends Sampler - this is a simple *random* 
implementation, which is now the default (used by default in 
SamplingWrapperAccumulator).
* The test randomly selects between the two sampler implementations. If you 
want to see the behavior that created the bug, remove that latter randomness by 
setting to false the variable *useRandomSampler* of 
*BaseSampleTestTopK.testCountUsingSamping()*.

I think this is ready to commit. 
Wasn't sure though, where should the Changes entry go?

 random sampler is not random (and so facet SamplingWrapperTest occasionally 
 fails)
 --

 Key: LUCENE-3501
 URL: https://issues.apache.org/jira/browse/LUCENE-3501
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
Priority: Minor
 Attachments: LUCENE-3501.patch


 RandomSample is not random at all:
 It does not even import java.util.Random, and its behavior is deterministic.
 in addition, the test testCountUsingSamping() never retries as it was 
 supposed to (for taking care of the hoped-for randomness).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3262) Facet benchmarking

2011-10-07 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3262:


Attachment: LUCENE-3262.patch

Updated patch according to Shai's comments and with AddFacetedDoc task.


 Facet benchmarking
 --

 Key: LUCENE-3262
 URL: https://issues.apache.org/jira/browse/LUCENE-3262
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/benchmark, modules/facet
Reporter: Shai Erera
Assignee: Doron Cohen
 Attachments: CorpusGenerator.java, LUCENE-3262.patch, 
 LUCENE-3262.patch, LUCENE-3262.patch, TestPerformanceHack.java


 A spin off from LUCENE-3079. We should define few benchmarks for faceting 
 scenarios, so we can evaluate the new faceting module as well as any 
 improvement we'd like to consider in the future (such as cutting over to 
 docvalues, implement FST-based caches etc.).
 Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here 
 as a starting point.
 We've also done some preliminary job for extending Benchmark for faceting, so 
 I'll attach it here as well.
 We should perhaps create a Wiki page where we clearly describe the benchmark 
 scenarios, then include results of 'default settings' and 'optimized 
 settings', or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3262) Facet benchmarking

2011-10-06 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3262:


Attachment: LUCENE-3262.patch

Updated patch with a test, more javadocs, and a comment as Shai suggested.

I think this is ready to commit.

More tests are needed, and also Search with facets is missing, but that can go 
in a separate issue.


 Facet benchmarking
 --

 Key: LUCENE-3262
 URL: https://issues.apache.org/jira/browse/LUCENE-3262
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/benchmark, modules/facet
Reporter: Shai Erera
Assignee: Doron Cohen
 Attachments: CorpusGenerator.java, LUCENE-3262.patch, 
 LUCENE-3262.patch, TestPerformanceHack.java


 A spin off from LUCENE-3079. We should define few benchmarks for faceting 
 scenarios, so we can evaluate the new faceting module as well as any 
 improvement we'd like to consider in the future (such as cutting over to 
 docvalues, implement FST-based caches etc.).
 Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here 
 as a starting point.
 We've also done some preliminary job for extending Benchmark for faceting, so 
 I'll attach it here as well.
 We should perhaps create a Wiki page where we clearly describe the benchmark 
 scenarios, then include results of 'default settings' and 'optimized 
 settings', or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3262) Facet benchmarking

2011-10-05 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3262:


Attachment: LUCENE-3262.patch

Patch (3x)  with working facets indexing benchmark.
It follows the outline above, except that: 
- there is no FacetDocMaker - only FacetSource
- there is no AddDocWithFacet - instead, AddDoc takes an additional config 
param: with.facet

'ant run-task -Dtask.alg=conf/facets.alg' will run an algorithm that indexes 
facets.

Not ready to commit yet - need some testing and docs. Also, only covers 
indexing for now, though perhaps search with facets can go in a separate issue.

 Facet benchmarking
 --

 Key: LUCENE-3262
 URL: https://issues.apache.org/jira/browse/LUCENE-3262
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/benchmark, modules/facet
Reporter: Shai Erera
Assignee: Doron Cohen
 Attachments: CorpusGenerator.java, LUCENE-3262.patch, 
 TestPerformanceHack.java


 A spin off from LUCENE-3079. We should define few benchmarks for faceting 
 scenarios, so we can evaluate the new faceting module as well as any 
 improvement we'd like to consider in the future (such as cutting over to 
 docvalues, implement FST-based caches etc.).
 Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here 
 as a starting point.
 We've also done some preliminary job for extending Benchmark for faceting, so 
 I'll attach it here as well.
 We should perhaps create a Wiki page where we clearly describe the benchmark 
 scenarios, then include results of 'default settings' and 'optimized 
 settings', or something like that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3484) TaxonomyWriter parents array creation is not thread safe, can cause NPE

2011-10-04 Thread Doron Cohen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3484:


Attachment: LUCENE-3484.patch

Patch with test that fails same as the reported error.

None of the changes here should be committed, just showing the error.

 TaxonomyWriter parents array creation is not thread safe, can cause NPE
 ---

 Key: LUCENE-3484
 URL: https://issues.apache.org/jira/browse/LUCENE-3484
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/facet
Reporter: Doron Cohen
Assignee: Doron Cohen
 Attachments: LUCENE-3484.patch


 Following user list thread [TaxWriter leakage? | 
 http://markmail.org/thread/jkkhemfzpnbdzoft] it appears that if two threads 
 or more are asking for the parent array for the first time, a context switch 
 after the first thread created the empty parents array but before it 
 initialized it would cause the other array to use an uninitialized array, 
 causing an NPE. Fix is simple: synchronize the method getParentArray()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org