Re: Build failed in Hudson: Solr-3.x #111
On Thu, Sep 23, 2010 at 1:32 AM, Apache Hudson Server hud...@hudson.apache.org wrote: [junit] Testsuite: org.apache.solr.handler.dataimport.TestEvaluatorBag [junit] Testcase: testGetDateFormatEvaluator(org.apache.solr.handler.dataimport.TestEvaluatorBag): Caused an ERROR [junit] expected:2010-09-21 [10]:31 but was:2010-09-21 [09]:31 [junit] at org.apache.solr.handler.dataimport.TestEvaluatorBag.testGetDateFormatEvaluator(TestEvaluatorBag.java:131) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:704) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:677) [junit] [junit] [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.039 sec [junit] [junit] - Standard Output --- [junit] NOTE: random locale of testcase 'testGetDateFormatEvaluator' was: fr_BE [junit] NOTE: random timezone of testcase 'testGetDateFormatEvaluator' was: PLT [junit] - --- [junit] TEST org.apache.solr.handler.dataimport.TestEvaluatorBag FAILED Some kind of race condition? I seem to hit this error randomly sometimes, but I can never reproduce it, even with same locale/timezone. -- Robert Muir rcm...@gmail.com
[jira] Resolved: (SOLR-2125) Spatial filter is not accurate
[ https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-2125. --- Fix Version/s: 3.1 4.0 Resolution: Fixed Committed to trunk and 3.x Spatial filter is not accurate -- Key: SOLR-2125 URL: https://issues.apache.org/jira/browse/SOLR-2125 Project: Solr Issue Type: Bug Components: Build Affects Versions: 1.5 Reporter: Bill Bell Assignee: Grant Ingersoll Fix For: 3.1, 4.0 Attachments: Distance.diff, SOLR-2125.patch, solrspatial.xlsx The calculations of distance appears to be off. Note: The radius of the sphere to be used when calculating distances on a sphere (i.e. haversine). Default is the Earth's mean radius in kilometers (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) which is set to 3,958.761458084784856. Most applications will not need to set this. The radius of the earth in KM is 6371.009 km (≈3958.761 mi). Also filtering distance appears to be off - example data: 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 miles = 220 kilometers http://../solr/select?fl=*,scorestart=0rows=10q={!sfilt%20fl=store_lat_lon}qt=standardpt=44.9369054,-91.3929348d=280sort=dist(2,store,vector(44.9369054,-91.3929348)) asc Nothing shows. d=285 shows results. This is off by a lot. Bill -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2131) Solr Boost returns unexpected results
Solr Boost returns unexpected results - Key: SOLR-2131 URL: https://issues.apache.org/jira/browse/SOLR-2131 Project: Solr Issue Type: Wish Reporter: Jayant Patil Hi, We are using Solr for our searches. We are facing issues while applying boost on particular fields. E.g. We have a field Category, which contains values like Electronics, Computers, Home Appliances, Mobile Phones etc. We want to boost the category Electronics and Mobile Phones, we are using the following query (category:Electronics^2 OR category:Mobile Phones^1 OR category:[* TO *]^0) The results are unexpected as Category Mobile Phones gets more boost than Electronics even if we are specifying the boost factor 2 for electronics and 1 for mobile phones respectively. On debugging we found that DocFreq is manipulating the scores and hence affecting the overall boost. The no. of docs for mobile phones is much lower than that for electronics and solr is giving higher score to mobile phones for this reason. Please suggest a solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914044#action_12914044 ] Yonik Seeley commented on LUCENE-2649: -- Now we're talking! Q: why aren't the CachePopulator methods just directly on EntryConfig - was it easier to share implementations that way or something? Also: - It doesn't seem like we need two methods fillValidBits , fillByteValues - shouldn't it just be one method that looks at the config and fills in the appropriate entries based on cacheValidBits() and cacheValues()? - We should allow an implementation to create subclasses of ByteValues, etc... what about this method: public abstract CachedArray fillEntry( CachedArray vals, IndexReader reader, String field, EntryConfig creator ) That way, an existing entry can be filled in (i.e. vals != null) or a new entry can be created. Oh, wait, I see further down a ByteValues createValue() - if that's meant to be a method on CachePopulator, I guess it's all good - my main concern was being able to create subclasses of ByteValues and frields. Anyway, all that's off the top of my head - I'm sure you've thought about it more at this point. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Solr-3.x #111
Wild guess: it's coming from a test that seems to deal with dates -- maybe it's code that uses a DateFormatter (parsing or formating) in a non-ThreadSafe way? in which case it may be likely to show up in parallel tests but not when running tests individually : Date: Thu, 23 Sep 2010 08:37:36 -0400 : From: Robert Muir rcm...@gmail.com : Reply-To: dev@lucene.apache.org : To: dev@lucene.apache.org : Subject: Re: Build failed in Hudson: Solr-3.x #111 : : On Thu, Sep 23, 2010 at 1:32 AM, Apache Hudson Server : hud...@hudson.apache.org wrote: : : [junit] Testsuite: org.apache.solr.handler.dataimport.TestEvaluatorBag : [junit] Testcase: : testGetDateFormatEvaluator(org.apache.solr.handler.dataimport.TestEvaluatorBag): : Caused an ERROR : [junit] expected:2010-09-21 [10]:31 but was:2010-09-21 [09]:31 : [junit] at : org.apache.solr.handler.dataimport.TestEvaluatorBag.testGetDateFormatEvaluator(TestEvaluatorBag.java:131) : [junit] at : org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:704) : [junit] at : org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:677) : [junit] : [junit] : [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.039 sec : [junit] : [junit] - Standard Output --- : [junit] NOTE: random locale of testcase 'testGetDateFormatEvaluator' : was: fr_BE : [junit] NOTE: random timezone of testcase 'testGetDateFormatEvaluator' : was: PLT : [junit] - --- : [junit] TEST org.apache.solr.handler.dataimport.TestEvaluatorBag FAILED : : : Some kind of race condition? I seem to hit this error randomly sometimes, : but I can never reproduce it, even with same locale/timezone. : : -- : Robert Muir : rcm...@gmail.com : -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Solr-3.x #111
On Thu, Sep 23, 2010 at 11:21 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: Wild guess: it's coming from a test that seems to deal with dates -- maybe it's code that uses a DateFormatter (parsing or formating) in a non-ThreadSafe way? possibly, good idea: i will apply some pressure to the test (-Dtests.iter) and see what happens in which case it may be likely to show up in parallel tests but not when running tests individually just for reference: our parallel tests are not parallel with threads but completely separate JVMs. so tests don't need to be thread-safe. Example: TestClassA, TestClassB, TestClassC, TestClassD with 2 threads we just spawn 2 jvms (jvm1 and jvm2): jvm1 executes TestClassA, then TestClassB jvm2 exectues TestClassC, then TestClassD the other thing parallel tests do is give jvm1 and jvm2 unique base temp directories, so that if they are working with the filesystem they wont step over each other. so parallel tests are rather safe, but we do have to be aware of statics (since jvm1 will run TestClassA, then TestClassB sequentially in the same jvm) -- Robert Muir rcm...@gmail.com
Re: Build failed in Hudson: Solr-3.x #111
On Thu, Sep 23, 2010 at 11:21 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: in which case it may be likely to show up in parallel tests but not when running tests individually another note for reference, the solr contribs dont use parallel testing yet at the moment (they will, if I can finish SOLR-2002 patch) but this isn't a big deal since there aren't many of them, and most don't have many tests. -- Robert Muir rcm...@gmail.com
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914082#action_12914082 ] Ryan McKinley commented on LUCENE-2649: --- bq. Q: why aren't the CachePopulator methods just directly on EntryConfig - was it easier to share implementations that way or something? Two reasons (but I can be talked out of it) 1. this approach separates what you are asking for (bits/values/etc) from how they are actually generated (the populator). Something makes me uncomfortable about the caller asking for Values needing to also know how they are generated. Seems easy to mess up. With this approach the 'populator' is attached to the field cache, and defines how stuff is read, vs the 'EntryConfig' that defines what the user is asking for (particulary since they may change what they are asking for in subsequent calls) 2. The 'populator' is attached to the FieldCache so it has consistent behavior across subsequet calls to getXxxxValues(). Note that with this approach, if you ask the field cache for just the 'values' then later want the 'bits' it uses the same populator and adds the results to the existing CachedArray value. bq. It doesn't seem like we need two methods fillValidBits , fillByteValues The 'fillValidBits' just fills up the valid bits w/o actually parsing (or caching) the values. This is useful when: 1. you only want the ValidBits, but not the values (Mike seems to want this) 2. you first ask for just values, then later want the bits. Thinking some more, I think the populator should look like this: {code:java} public abstract class CachePopulator { public abstract ByteValues createByteValues( IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract ShortValues createShortValues( IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract IntValuescreateIntValues(IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract FloatValues createFloatValues( IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract DoubleValues createDoubleValues( IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract void fillByteValues( ByteValues vals, IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract void fillShortValues( ShortValues vals, IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract void fillIntValues(IntValuesvals, IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract void fillFloatValues( FloatValues vals, IndexReader reader, String field, EntryConfig config ) throws IOException; public abstract void fillDoubleValues( DoubleValues vals, IndexReader reader, String field, EntryConfig config ) throws IOException; // This will only fill in the ValidBits w/o parsing any actual values public abstract void fillValidBits( CachedArray vals, IndexReader reader, String field, EntryConfig config ) throws IOException; } {code} The default 'create' implementation could look something like this: {code:java} @Override public ShortValues createShortValues( IndexReader reader, String field, EntryConfig config ) throws IOException { if( config == null ) { config = new SimpleEntryConfig(); } ShortValues vals = new ShortValues(); if( config.cacheValues() ) { this.fillShortValues(vals, reader, field, config); } else if( config.cacheValidBits() ) { this.fillValidBits(vals, reader, field, config); } else { throw new RuntimeException( the config must cache values and/or bits ); } return vals; } {code} And the Cache 'createValue' would looks somethign like this: {code:java} static final class ByteCache extends Cache { ByteCache(FieldCache wrapper) { super(wrapper); } @Override protected final ByteValues createValue(IndexReader reader, Entry entry, CachePopulator populator) throws IOException { String field = entry.field; EntryConfig config = (EntryConfig)entry.custom; if (config == null) { return wrapper.getByteValues(reader, field, new SimpleEntryConfig() ); } return populator.createByteValues(reader, field, config); } } {code} thoughts? This would open up lots more of the field cache... so if we go this route, lets make sure it addresses the other issues people have with FieldCache. IIUC, the other big request is to load the values from an external source -- that should be possible with this approach. FieldCache should include a BitSet for matching docs
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914089#action_12914089 ] Yonik Seeley commented on LUCENE-2649: -- Oh... I mis-matched the parens when I was looking at your proposal (hence the confusion). I think getCachePopulator() should be under EntryConfig - that way people can provide their own (and extend ByteValues to include more info) Otherwise, we'll forever be locked into a lowest common denominator of only adding info that everyone can agree on. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914095#action_12914095 ] Ryan McKinley commented on LUCENE-2649: --- {quote} I think getCachePopulator() should be under EntryConfig - that way people can provide their own (and extend ByteValues to include more info) {quote} So you think it is better for *each* call to define how the cache works rather then having that as an attribute of the FieldCache (that could be extended). The on thing that concerns me is that that forces all users of the FieldCache to be in sync. In this proposal, you could set the CachePopulator on the FieldCache. {quote} Otherwise, we'll forever be locked into a lowest common denominator of only adding info that everyone can agree on. {quote} This is why I just added the 'createXxxxValues' functions on CachePopulator -- a subclass could add other values. It looks like the basic difference between what we are thinking is that the Populator is attached to the FieldCache rather then each call to the FieldCache. From my point of view, this would make it easier for system with a schema (like solr) have consistent results across all calls, rather then making each request to the FieldCache need to know about the schema - parsers - populator but I can always be convinced ;) FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe reassigned LUCENE-2657: --- Assignee: Steven Rowe Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. Full Maven POMs will include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914100#action_12914100 ] Yonik Seeley commented on LUCENE-2649: -- bq. In this proposal, you could set the CachePopulator on the FieldCache. Hmmm, OK, as long as it's possible. bq. From my point of view, this would make it easier for system with a schema (like solr) have consistent results across all calls, rather then making each request to the FieldCache need to know about the schema - parsers - populator I think this may make it a lot harder from Solr's point of view. - it's essentially a static... so it had better not ever be configurable from the schema or solrconfig, or it will break multi-core. - if we ever *did* want to treat fields differently (load some values from a DB, etc), we'd want to look that up in the schema - but we don't have a reference to the scema in the populator, and we wouldn't want to store one there (again, we have multiple schemas). So... we could essentially create custom EntryConfig object and then our custom CachePopulator could delegate to the entry config (and we've essentially re-invented a way to be able to specify the populator on a per-field basis). Are EntryConfig objects stored as keys anywhere? We need to be very careful about memory leaks. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914109#action_12914109 ] Yonik Seeley commented on LUCENE-2649: -- It's all doable though I guess - even if EntryConfig objects are used as cache keys, we could store a weak reference to the solr core. So I say, proceed with what you think will make it easy for Lucene users - and don't focus on what will be easy for Solr. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914112#action_12914112 ] Steven Rowe commented on LUCENE-2657: - {quote} bq. Full Maven POMs will include the information necessary to run a multi-module Maven build That sort of sounds like a parallel build process (i.e. you would be able to build lucene/solr itself with maven). Is it? We've avoided that type of thing in the past. {quote} Yes, it constitutes a parallel build process, but the Ant build process would remain the official build. For example, the artifacts produced by the Maven build will not be exactly the same as those produced by the Ant build process, and so cannot be used as release artifacts. It has been noted elsewhere that a Maven build has never been produced for Lucene. I hope in this issue to provide that, so that the discussion about whether or not to include it in the source tree has a concrete reference point. Replace Maven POM templates with full POMs, and change documentation accordingly Key: LUCENE-2657 URL: https://issues.apache.org/jira/browse/LUCENE-2657 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Steven Rowe Assignee: Steven Rowe Fix For: 3.1, 4.0 The current Maven POM templates only contain dependency information, the bare bones necessary for uploading artifacts to the Maven repository. Full Maven POMs will include the information necessary to run a multi-module Maven build, in addition to serving the same purpose as the current POM templates. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914115#action_12914115 ] Ryan McKinley commented on LUCENE-2649: --- preface, I don't really know how FieldCache is used, so my assumptions could be way off... In solr, is there one FieldCache for all all cores, or does each core get its own FieldCache? I figured each core would create a single CachePopulator (with a reference to the schema) and attach it to the FieldCache. If that is not possible, then ya, it will be better to put that in the request. bq. Are EntryConfig objects stored as keys anywhere? We need to be very careful about memory leaks. Yes, the EntryConfig is part of the 'Entry' and gets stored as a key. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914123#action_12914123 ] Yonik Seeley commented on LUCENE-2649: -- bq. In solr, is there one FieldCache for all all cores, or does each core get its own FieldCache? There is a single FieldCache for all cores (same as in Lucene). FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914136#action_12914136 ] Yonik Seeley commented on LUCENE-2649: -- Passing it in would also allow a way to get rid of the StopFillCacheException hack for NumericField in the future. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)
wrong exception from NativeFSLockFactory (LIA2 test case) - Key: LUCENE-2663 URL: https://issues.apache.org/jira/browse/LUCENE-2663 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Robert Muir Fix For: 3.1, 4.0 As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found one of the test cases fail the test is pretty simple, and passes on 3.0. The exception you get instead (LockReleaseFailedException) is pretty confusing and I think we should fix it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)
[ https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2663: Attachment: LUCENE-2663_test.patch wrong exception from NativeFSLockFactory (LIA2 test case) - Key: LUCENE-2663 URL: https://issues.apache.org/jira/browse/LUCENE-2663 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2663_test.patch As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found one of the test cases fail the test is pretty simple, and passes on 3.0. The exception you get instead (LockReleaseFailedException) is pretty confusing and I think we should fix it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)
[ https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914139#action_12914139 ] Robert Muir commented on LUCENE-2663: - for trunk, just change the test to use MockAnalyzer instead of simple... i made the patch from 3x. wrong exception from NativeFSLockFactory (LIA2 test case) - Key: LUCENE-2663 URL: https://issues.apache.org/jira/browse/LUCENE-2663 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2663_test.patch As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found one of the test cases fail the test is pretty simple, and passes on 3.0. The exception you get instead (LockReleaseFailedException) is pretty confusing and I think we should fix it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)
[ https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914141#action_12914141 ] Uwe Schindler commented on LUCENE-2663: --- May there be a randomization issue. If both index writers use a different LockFacory, it can easy fail! So at least for one test run, the LockFactory should be identical, else the locking can produce wrong failures/messages, as SimpleFSLock and NativeFSLock can interact very limited, but there are only some security checks, so locking will fail if you mix the implementations. wrong exception from NativeFSLockFactory (LIA2 test case) - Key: LUCENE-2663 URL: https://issues.apache.org/jira/browse/LUCENE-2663 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2663_test.patch As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found one of the test cases fail the test is pretty simple, and passes on 3.0. The exception you get instead (LockReleaseFailedException) is pretty confusing and I think we should fix it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)
[ https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914144#action_12914144 ] Robert Muir commented on LUCENE-2663: - The test isnt random. it uses FSDirectory.open wrong exception from NativeFSLockFactory (LIA2 test case) - Key: LUCENE-2663 URL: https://issues.apache.org/jira/browse/LUCENE-2663 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2663_test.patch As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found one of the test cases fail the test is pretty simple, and passes on 3.0. The exception you get instead (LockReleaseFailedException) is pretty confusing and I think we should fix it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)
[ https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914146#action_12914146 ] Uwe Schindler commented on LUCENE-2663: --- Sorry, you are right. The exception throws is clearly wrong :-) Maybe its related to Shai's changes in NativeFSLockFactory (which is the default)? wrong exception from NativeFSLockFactory (LIA2 test case) - Key: LUCENE-2663 URL: https://issues.apache.org/jira/browse/LUCENE-2663 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2663_test.patch As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found one of the test cases fail the test is pretty simple, and passes on 3.0. The exception you get instead (LockReleaseFailedException) is pretty confusing and I think we should fix it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)
[ https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914160#action_12914160 ] Michael McCandless commented on LUCENE-2663: I think it is the recent changes to NativeFSLockFactory... IW tries to first clear the lock, if create=true, and that attempt is causing the exception. Really this is a holdover from SimpleFSLockFactory, which can leave orphan'd locks... so maybe somehow we shouldn't do this for other lock factories? wrong exception from NativeFSLockFactory (LIA2 test case) - Key: LUCENE-2663 URL: https://issues.apache.org/jira/browse/LUCENE-2663 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2663_test.patch As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found one of the test cases fail the test is pretty simple, and passes on 3.0. The exception you get instead (LockReleaseFailedException) is pretty confusing and I think we should fix it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914163#action_12914163 ] Ryan McKinley commented on LUCENE-2649: --- Ok, as I look more, I think it may be worth some even bigger changes! Is there any advantage to having a different map for each Type? The double (and triple) cache can get a bit crazy and lead to so much duplication What about moving to a FieldCache that is centered around the very basic API: {code:java} public T T get(IndexReader reader, String field, EntryCreatorT creator) {code} Entry creator would be something like {code:java} public abstract static class EntryCreatorT implements Serializable { public abstract T create( IndexReader reader, String field ); public abstract void validate( T entry, IndexReader reader, String field ); /** * NOTE: the hashCode is used as part of the cache Key, so make sure it * only changes if you want different entries for the same field */ @Override public int hashCode() { return EntryCreator.class.hashCode(); } } {code} We could add all the utility functions that cast stuff to ByteValues etc. We would also make sure that the Map does not use the EntryCreator as a key, but uses it to generate a key. A sample EntryCreator would look like this: {code:java} class BytesEntryCreator extends FieldCache.EntryCreatorByteValues { @Override public ByteValues create(IndexReader reader, String field) { // all the normal walking stuff using whatever parameters we have specified } @Override public void validate(ByteValues entry, IndexReader reader, String field) { // all the normal walking stuff using whatever parameters we have specified } } {code} Thoughts on this approach? Crazy how a seemingly simple issue just explodes :( FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914169#action_12914169 ] Yonik Seeley commented on LUCENE-2649: -- Hmmm, that would also seem to transform the FieldCache into a more generic index reader cache - not a bad idea! FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1568) Implement Spatial Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-1568: --- Attachment: SOLR-1568.patch Here's a patch that changes sfilt to use getParam() which first gets from local params, but then also checks global request params if missing. I also added sfield since getting fl from global params as the spatial field is obviously wrong. Since a single solr request may have multiple spatial related things (distance function, sort by distance, filter) it allows easier sharing. example: sfield=storept=10,20d=200fq={!sfilt}sort=sdist() asc (I made up sdist, it doesn't exist yet... but you get the idea). Implement Spatial Filter Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1, 4.0 Attachments: CartesianTierQParserPlugin.java, SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr. The interface _could_ look like: {code} fq={!sfilt dist=20}location:49.32,-79.0 {code} or it could be: {code} fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20} {code} or: {code} fq={!sfilt p=49.32,-79.0 f=location dist=20} {code} or: {code} fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914178#action_12914178 ] Uwe Schindler commented on LUCENE-2649: --- bq. Anyone know what the deal is with IndexReader: It is to share the cache for clones of IndexReaders or when SegmentReaders are reopended with different deleted docs. In this case the underlying Reader is the same, so it should use its cache (e.g. when deleted docs are added, you dont need to invalidate the cache). For more info, ask Mike McCandless! FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2664) Add SimpleText codec
Add SimpleText codec Key: LUCENE-2664 URL: https://issues.apache.org/jira/browse/LUCENE-2664 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Inspired by Sahin Buyrukbilen's question here: http://www.lucidimagination.com/search/document/b68846e383824653/how_to_export_lucene_index_to_a_simple_text_file#b68846e383824653 I made a simple read/write codec that stores all postings data into a single text file (_X.pst), looking like this: {noformat} field contents term file doc 0 pos 5 term is doc 0 pos 1 term second doc 0 pos 3 term test doc 0 pos 4 term the doc 0 pos 2 term this doc 0 pos 0 END {noformat} The codec is fully funtional -- all Lucene Solr tests pass with -Dtests.codec=SimpleText -- but, its performance is obviously poor. However, it should be useful for debugging, transparency, understanding just what Lucene stores in its index, etc. And it's a quick way to gain some understanding on how a codec works... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914185#action_12914185 ] Uwe Schindler commented on LUCENE-2649: --- Yonik: I was expecting this answer... The reason is that my current contact (it's also your's) has exactly that problem also with norms (but also FC), that they want to lazily load values for sorting/norms (see the very old issue LUCENE-505). At least we should have a TopFieldDocCollector that can alternatively to native arrays also use a ValueSource-like aproach with getter methods - so you could sort against a CSF. Even if it is 20% slower, in some cases thats the only way to get a suitable search experience. Not always speed is the most important thing, sometimes also space requirements or warmup times. I would have no problem with providing both and chosing the implementation that is most speed-effective. So if no native arrays are provided by the FieldCache use getter methods. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2664) Add SimpleText codec
[ https://issues.apache.org/jira/browse/LUCENE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914186#action_12914186 ] Yonik Seeley commented on LUCENE-2664: -- heh - cool! Add SimpleText codec Key: LUCENE-2664 URL: https://issues.apache.org/jira/browse/LUCENE-2664 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2664.patch Inspired by Sahin Buyrukbilen's question here: http://www.lucidimagination.com/search/document/b68846e383824653/how_to_export_lucene_index_to_a_simple_text_file#b68846e383824653 I made a simple read/write codec that stores all postings data into a single text file (_X.pst), looking like this: {noformat} field contents term file doc 0 pos 5 term is doc 0 pos 1 term second doc 0 pos 3 term test doc 0 pos 4 term the doc 0 pos 2 term this doc 0 pos 0 END {noformat} The codec is fully funtional -- all Lucene Solr tests pass with -Dtests.codec=SimpleText -- but, its performance is obviously poor. However, it should be useful for debugging, transparency, understanding just what Lucene stores in its index, etc. And it's a quick way to gain some understanding on how a codec works... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2664) Add SimpleText codec
[ https://issues.apache.org/jira/browse/LUCENE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2664: --- Attachment: LUCENE-2664.patch Add SimpleText codec Key: LUCENE-2664 URL: https://issues.apache.org/jira/browse/LUCENE-2664 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2664.patch Inspired by Sahin Buyrukbilen's question here: http://www.lucidimagination.com/search/document/b68846e383824653/how_to_export_lucene_index_to_a_simple_text_file#b68846e383824653 I made a simple read/write codec that stores all postings data into a single text file (_X.pst), looking like this: {noformat} field contents term file doc 0 pos 5 term is doc 0 pos 1 term second doc 0 pos 3 term test doc 0 pos 4 term the doc 0 pos 2 term this doc 0 pos 0 END {noformat} The codec is fully funtional -- all Lucene Solr tests pass with -Dtests.codec=SimpleText -- but, its performance is obviously poor. However, it should be useful for debugging, transparency, understanding just what Lucene stores in its index, etc. And it's a quick way to gain some understanding on how a codec works... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914188#action_12914188 ] Michael McCandless commented on LUCENE-2649: bq. I see no relief for this issue on the horizon. We need to specialize the code... either manually or automatically... FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914191#action_12914191 ] Yonik Seeley commented on LUCENE-2649: -- bq. If we like the more general cache, that probably needs its own issue Correct - with a more general fieldCache, one could implement alternatives that MMAP files, etc. But those alternative implementations certainly should not be included in this issue. FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914194#action_12914194 ] Uwe Schindler commented on LUCENE-2649: --- I just wanted to mention that, so the design of the new FC is more flexible in that case. I am just pissed of because of these arrays and no flexibility :-( FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2649) FieldCache should include a BitSet for matching docs
[ https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914194#action_12914194 ] Uwe Schindler edited comment on LUCENE-2649 at 9/23/10 3:49 PM: I just wanted to mention that, so the design of the new FC is more flexible in that case. I am just pissed of because of these arrays and no flexibility :-( The FC impl should be in line with the CSF aproach from LUCENE-2186. was (Author: thetaphi): I just wanted to mention that, so the design of the new FC is more flexible in that case. I am just pissed of because of these arrays and no flexibility :-( FieldCache should include a BitSet for matching docs Key: LUCENE-2649 URL: https://issues.apache.org/jira/browse/LUCENE-2649 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Fix For: 4.0 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch The FieldCache returns an array representing the values for each doc. However there is no way to know if the doc actually has a value. This should be changed to return an object representing the values *and* a BitSet for all valid docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2665) Rework FieldCache to be more flexible/general
Rework FieldCache to be more flexible/general - Key: LUCENE-2665 URL: https://issues.apache.org/jira/browse/LUCENE-2665 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley The existing FieldCache implementation is very rigid and does not allow much flexibility. In trying to implement simple features, it points to much larger structural problems. This patch aims to take a fresh approach to how we work with the FieldCache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2665) Rework FieldCache to be more flexible/general
[ https://issues.apache.org/jira/browse/LUCENE-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated LUCENE-2665: -- Attachment: LUCENE-2665-FieldCacheOverhaul.patch This is a quick sketch of a more general FieldCache -- it only implements the ByteValues case, and is implemented in a different package. The core API looks like this: {code:java} public T T get(IndexReader reader, String field, EntryCreatorT creator) throws IOException {code} and the EntryCreator looks like this: {code:java} public abstract class EntryCreatorT implements Serializable { public abstract T create( IndexReader reader, String field ) throws IOException; public abstract T validate( T entry, IndexReader reader, String field ) throws IOException; /** * Indicate if a cached cached value should be checked before usage. * This is useful if an application wants to support subsequent calls * to the same cached object that may alter the cached object. If * an application wants to avoid this (synchronized) check, it should * return 'false' * * @return 'true' if the Cache should call 'validate' before returning a cached object */ public boolean shouldValidate() { return true; } /** * @return A key to identify valid cache entries for subsequent requests */ public Integer getCacheKey( IndexReader reader, String field ) { return new Integer( EntryCreator.class.hashCode() ^ field.hashCode() ); } } {code} For a real cleanup, I think it makes sense to move the Parser stuff to somewhere that deals with numerics -- I don't get why that is tied to the FieldCache Just as sketch to get us thinking Rework FieldCache to be more flexible/general - Key: LUCENE-2665 URL: https://issues.apache.org/jira/browse/LUCENE-2665 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Attachments: LUCENE-2665-FieldCacheOverhaul.patch The existing FieldCache implementation is very rigid and does not allow much flexibility. In trying to implement simple features, it points to much larger structural problems. This patch aims to take a fresh approach to how we work with the FieldCache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2665) Rework FieldCache to be more flexible/general
[ https://issues.apache.org/jira/browse/LUCENE-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914270#action_12914270 ] Ryan McKinley commented on LUCENE-2665: --- While we are thinking about it... perhaps the best option is to add a Cache to the IndexReader itself. This would be nice since it would would drop the WeakHashMapIndexReader that is used all over. I just like hard references better! The one (ok maybe there are more) hitch I can think of is that some Cache instances don't care about deleted docs (FieldCache) and others do. Perhaps the EntryCreator knows and the Cache coud behave differently... or am i getting ahead of myself! Rework FieldCache to be more flexible/general - Key: LUCENE-2665 URL: https://issues.apache.org/jira/browse/LUCENE-2665 Project: Lucene - Java Issue Type: Improvement Reporter: Ryan McKinley Attachments: LUCENE-2665-FieldCacheOverhaul.patch The existing FieldCache implementation is very rigid and does not allow much flexibility. In trying to implement simple features, it points to much larger structural problems. This patch aims to take a fresh approach to how we work with the FieldCache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2444) move contrib/analyzers to modules/analysis
[ https://issues.apache.org/jira/browse/LUCENE-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2444. - Assignee: Robert Muir Resolution: Fixed move contrib/analyzers to modules/analysis -- Key: LUCENE-2444 URL: https://issues.apache.org/jira/browse/LUCENE-2444 Project: Lucene - Java Issue Type: Task Components: Build Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: LUCENE-2444.patch, LUCENE-2444.patch, LUCENE-2444_boilerplate.patch This is a patch to move contrib/analyzers under modules/analyzers We can then continue consolidating (LUCENE-2413)... in truth this will sorta be an ongoing thing anyway, as we try to distance indexing from analysis, etc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2651) Add support for MMapDirectory's unmap in Apache Harmony
[ https://issues.apache.org/jira/browse/LUCENE-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2651: Component/s: Store Add support for MMapDirectory's unmap in Apache Harmony --- Key: LUCENE-2651 URL: https://issues.apache.org/jira/browse/LUCENE-2651 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.1, 4.0 Attachments: LUCENE-2651.patch, LUCENE-2651.patch, LUCENE-2651.patch, LUCENE-2651.patch The setUseUnmap does not work on Apache Harmony, this patch adds support for it. It also fixes a small problem, that unmapping a clone may cause a sigsegv. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2471) Supporting bulk copies in Directory
[ https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2471: Component/s: Store Supporting bulk copies in Directory --- Key: LUCENE-2471 URL: https://issues.apache.org/jira/browse/LUCENE-2471 Project: Lucene - Java Issue Type: Improvement Components: Store Reporter: Earwin Burrfoot Fix For: 3.1, 4.0 A method can be added to IndexOutput that accepts IndexInput, and writes bytes using it as a source. This should be used for bulk-merge cases (offhand - norms, docstores?). Some Directories can then override default impl and skip intermediate buffers (NIO, MMap, RAM?). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2528) CFSFileDirectory: Allow a Compound Index file to be deployed as a complete index without segment files
[ https://issues.apache.org/jira/browse/LUCENE-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2528: Component/s: Store CFSFileDirectory: Allow a Compound Index file to be deployed as a complete index without segment files -- Key: LUCENE-2528 URL: https://issues.apache.org/jira/browse/LUCENE-2528 Project: Lucene - Java Issue Type: New Feature Components: Store Reporter: Lance Norskog Priority: Minor Attachments: LUCENE-2528.patch This patch presents a compound index file as a Lucene Directory class. This allows you to deploy one file to a query server instead of deploying a directory with the compound file and two segment files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2586) move intblock/sep codecs into test
[ https://issues.apache.org/jira/browse/LUCENE-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2586: Component/s: Index move intblock/sep codecs into test -- Key: LUCENE-2586 URL: https://issues.apache.org/jira/browse/LUCENE-2586 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2586.patch The intblock and sep codecs in core exist to make it easy for people to try different low-level algos for encoding ints. Sep breaks docs, freqs, pos, skip data, payloads into 5 separate files (vs 2 files that standard codec uses). Intblock further enables the docs, freqs, pos files to encode fixed-sized blocks of ints at a time. So an app can easily subclass these codecs, using their own int encoder. But these codecs are now concrete, and they use dummy low-level block int encoder (eg encoding 128 ints as separate vints). I'd like to change these to be abstract, and move these dummy codecs into test. The tests would still test these dummy codecs, by rotating them in randomly for all tests. I'd also like to rename IntBlock - FixedIntBlock, because I'm trying to get a VariableIntBlock working well (for int encoders like Simple9, Simple16, whose block size varies depending on the particular values). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2571) Indexing performance tests with realtime branch
[ https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2571: Component/s: Index Indexing performance tests with realtime branch --- Key: LUCENE-2571 URL: https://issues.apache.org/jira/browse/LUCENE-2571 Project: Lucene - Java Issue Type: Task Components: Index Reporter: Michael Busch Priority: Minor Fix For: Realtime Branch We should run indexing performance tests with the DWPT changes and compare to trunk. We need to test both single-threaded and multi-threaded performance. NOTE: flush by RAM isn't implemented just yet, so either we wait with the tests or flush by doc count. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2529) always apply position increment gap between values
[ https://issues.apache.org/jira/browse/LUCENE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2529: Component/s: Index always apply position increment gap between values -- Key: LUCENE-2529 URL: https://issues.apache.org/jira/browse/LUCENE-2529 Project: Lucene - Java Issue Type: Improvement Components: Index Environment: (I don't know which version to say this affects since it's some quasi trunk release and the new versioning scheme confuses me.) Reporter: David Smiley Fix For: 3.1, 4.0 Attachments: LUCENE-2529_always_apply_position_increment_gap_between_values.patch Original Estimate: 1h Remaining Estimate: 1h I'm doing some fancy stuff with span queries that is very sensitive to term positions. I discovered that the position increment gap on indexing is only applied between values when there are existing terms indexed for the document. I suspect this logic wasn't deliberate, it's just how its always been for no particular reason. I think it should always apply the gap between fields. Reference DocInverterPerField.java line 82: if (fieldState.length 0) fieldState.position += docState.analyzer.getPositionIncrementGap(fieldInfo.name); This is checking fieldState.length. I think the condition should simply be: if (i 0). I don't think this change will affect anyone at all but it will certainly help me. Presently, I can either change this line in Lucene, or I can put in a hack so that the first value for the document is some dummy value which is wasteful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2573: Component/s: Index Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2607) IndexWriter.isLocked() fails on a read-only directory
[ https://issues.apache.org/jira/browse/LUCENE-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2607: Component/s: Index IndexWriter.isLocked() fails on a read-only directory - Key: LUCENE-2607 URL: https://issues.apache.org/jira/browse/LUCENE-2607 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.2 Reporter: Trejkaz This appears to be a regression of some sort because the issue was only discovered by us some time after upgrading to the 2.9 series, and was not present when we were using 2.3 (big gap between those two, though.) We had some code like: {code} if (IndexWriter.isLocked(directory)) { IndexWriter.unlock(directory); } {code} And now we get an exception when this code runs on a read-only location: {noformat} java.lang.RuntimeException: Failed to acquire random test lock; please verify filesystem for lock directory 'X:\Data\Index' supports locking at org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:99) at org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:137) at org.apache.lucene.store.Directory.makeLock(Directory.java:131) at org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:5672) at {noformat} I think it makes more logical sense to return *false* - if locking is not possible then it cannot be locked, therefore isLocked should always return false. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2527) FieldCache.getTermsIndex should cache fasterButMoreRAM=true|false to the same cache key
[ https://issues.apache.org/jira/browse/LUCENE-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2527: Component/s: Search FieldCache.getTermsIndex should cache fasterButMoreRAM=true|false to the same cache key --- Key: LUCENE-2527 URL: https://issues.apache.org/jira/browse/LUCENE-2527 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 4.0 Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 When we cutover FieldCache to use shared byte[] blocks, we added the boolean fasterButMoreRAM option, so you could tradeoff time/space. It defaults to true. The thinking is that an expert user, who wants to use false, could pre-populate FieldCache by loading the field with false, and then later when sorting on that field it'd use that same entry. But there's a bug -- when sorting, it then loads a 2nd entry with true. This is because the Entry.custom in FieldCache participates in equals/hashCode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2605) queryparser parses on whitespace
[ https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2605: Component/s: QueryParser queryparser parses on whitespace Key: LUCENE-2605 URL: https://issues.apache.org/jira/browse/LUCENE-2605 Project: Lucene - Java Issue Type: Bug Components: QueryParser Reporter: Robert Muir Fix For: 3.1, 4.0 The queryparser parses input on whitespace, and sends each whitespace separated term to its own independent token stream. This breaks the following at query-time, because they can't see across whitespace boundaries: * n-gram analysis * shingles * synonyms (especially multi-word for whitespace-separated languages) * languages where a 'word' can contain whitespace (e.g. vietnamese) Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will do the same thing at index and querytime, but in many cases they can't. Instead, preferably the queryparser would parse around only real 'operators'. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2597) Query scorers should not use MultiFields
[ https://issues.apache.org/jira/browse/LUCENE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2597: Component/s: Search Query scorers should not use MultiFields Key: LUCENE-2597 URL: https://issues.apache.org/jira/browse/LUCENE-2597 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2597.patch Lucene does all searching/filtering per-segment, today, but there are a number of tests that directly invoke Scorer.scorer or Filter.getDocIdSet on a composite reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2504: Component/s: Search sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip, LUCENE-2504_SortMissingLast.patch sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2665) Rework FieldCache to be more flexible/general
[ https://issues.apache.org/jira/browse/LUCENE-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2665: Component/s: Search Rework FieldCache to be more flexible/general - Key: LUCENE-2665 URL: https://issues.apache.org/jira/browse/LUCENE-2665 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Ryan McKinley Attachments: LUCENE-2665-FieldCacheOverhaul.patch The existing FieldCache implementation is very rigid and does not allow much flexibility. In trying to implement simple features, it points to much larger structural problems. This patch aims to take a fresh approach to how we work with the FieldCache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2566) + - operators allow any amount of whitespace
[ https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2566: Component/s: QueryParser + - operators allow any amount of whitespace Key: LUCENE-2566 URL: https://issues.apache.org/jira/browse/LUCENE-2566 Project: Lucene - Java Issue Type: Bug Components: QueryParser Reporter: Yonik Seeley Priority: Minor As an example, (foo - bar) is treated like (foo -bar). It seems like for +- to be treated as unary operators, they should be immediately followed by the operand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
bulk change 'don't email'
didnt there used to be an option in jira not to email for bulk changes? it seems to be gone... i was trying to clean up jira some. sorry for the noise -- Robert Muir rcm...@gmail.com
[jira] Resolved: (LUCENE-712) Build with GCJ fail
[ https://issues.apache.org/jira/browse/LUCENE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-712. Resolution: Not A Problem I compiled lucene with gcj, it builds fine. However, many tests fail. gcj's classpath appears to be a dead project, and personally i won't go anywhere near their source code. I don't recommend using lucene with gcj. Build with GCJ fail --- Key: LUCENE-712 URL: https://issues.apache.org/jira/browse/LUCENE-712 Project: Lucene - Java Issue Type: Bug Components: Build Affects Versions: 2.0.0 Reporter: Nicolas Lalevée Priority: Minor Attachments: patch just need some little fix in the jar name and some issue with some anonymous contructor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-472) Some fixes to let gcj build lucene using ant gcj target
[ https://issues.apache.org/jira/browse/LUCENE-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-472. Resolution: Not A Problem I compiled lucene with gcj, it builds fine. However, many tests fail. gcj's classpath appears to be a dead project, and personally i won't go anywhere near their source code. I don't recommend using lucene with gcj. Some fixes to let gcj build lucene using ant gcj target --- Key: LUCENE-472 URL: https://issues.apache.org/jira/browse/LUCENE-472 Project: Lucene - Java Issue Type: Bug Components: Build Affects Versions: CVS Nightly - Specify date in submission Reporter: Michele Bini Priority: Minor Attachments: gcj-build.diff I'm attaching a patch that fixes two problems with the gcj build. First, some imports in lucene.search.FieldCacheImpl.java that gcj requires but jdk doesn't were missing. Second, the Makefile uses the wrong name for the lucene-core .jar file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-471) gcj ant target doesn't work on windows
[ https://issues.apache.org/jira/browse/LUCENE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-471. Resolution: Fixed I compiled lucene with gcj, it builds fine. However, many tests fail. gcj's classpath appears to be a dead project, and personally i won't go anywhere near their source code. I don't recommend using lucene with gcj. gcj ant target doesn't work on windows -- Key: LUCENE-471 URL: https://issues.apache.org/jira/browse/LUCENE-471 Project: Lucene - Java Issue Type: Bug Components: Build Affects Versions: CVS Nightly - Specify date in submission Environment: Windows with MinGW (http://www.mingw.org/) and unixutils (http://unxutils.sourceforge.net/) Reporter: Michele Bini Priority: Minor Attachments: win-makefile.diff, win-mmap.diff In order to fix it I made two changes, both really simple. First I added to org/apache/lucene/store/GCJIndexInput.cc some code to use windows memory-mapped I/O instead than unix mmap(). Then I had to rearrange the link order in the Makefile in order to avoid unresolved symbol errors. Also to build repeatedly I had to instruct make to ignore the return code for the mkdir command as on windows it fails if the directory already exists. I'm attaching two patches corresponding to the changes; please note that with the patches applied, the gcj target still works on linux. Both patches apply cleanly to the current svn head. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914296#action_12914296 ] Jason Rutherglen commented on LUCENE-2573: -- I was hoping something clever would come to me about how to unit test this, nothing has. We can do the slowdown of writes to the file(s) via a Thread.sleep, however this will only emulate a real file system in RAM, what then? I thought about testing the percentage however is it going to be exact? We could test a percentage range of each of the segments flushed? I guess I just need to run the all of the unit tests, however some of those will fail because deletes aren't working properly yet. Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2070) document LengthFilter wrt Unicode 4.0
[ https://issues.apache.org/jira/browse/LUCENE-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2070. - Assignee: Robert Muir Fix Version/s: 3.1 Resolution: Fixed Committed revision 1000675, 1000678 (3x) document LengthFilter wrt Unicode 4.0 - Key: LUCENE-2070 URL: https://issues.apache.org/jira/browse/LUCENE-2070 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Robert Muir Assignee: Robert Muir Priority: Trivial Fix For: 3.1, 4.0 Attachments: LUCENE-2070.patch LengthFilter calculates its min/max length from TermAttribute.termLength() This is not characters, but instead UTF-16 code units. In my opinion this should not be changed, merely documented. If we changed it, it would have an adverse performance impact because we would have to actually calculate Character.codePointCount() on the text. If you feel strongly otherwise, fixing it to count codepoints would be a trivial patch, but I'd rather not hurt performance. I admit I don't fully understand all the use cases for this filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections
[ https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914301#action_12914301 ] Robert Muir commented on LUCENE-1540: - Tim, if you have modified benchmark to work with various formats of older TREC collections, that would be really nice. Improvements to contrib.benchmark for TREC collections -- Key: LUCENE-1540 URL: https://issues.apache.org/jira/browse/LUCENE-1540 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.4 Reporter: Tim Armstrong Priority: Minor The benchmarking utilities for TREC test collections (http://trec.nist.gov) are quite limited and do not support some of the variations in format of older TREC collections. I have been doing some benchmarking work with Lucene and have had to modify the package to support: * Older TREC document formats, which the current parser fails on due to missing document headers. * Variations in query format - newlines after title tag causing the query parser to get confused. * Ability to detect and read in uncompressed text collections * Storage of document numbers by default without storing full text. I can submit a patch if there is interest, although I will probably want to write unit tests for the new functionality first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2246) While indexing Turkish web pages, Parse Aborted: Lexical error.... occurs
[ https://issues.apache.org/jira/browse/LUCENE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2246: Component/s: Examples (was: Index) While indexing Turkish web pages, Parse Aborted: Lexical error occurs --- Key: LUCENE-2246 URL: https://issues.apache.org/jira/browse/LUCENE-2246 Project: Lucene - Java Issue Type: Bug Components: Examples Affects Versions: 3.0 Reporter: Selim Nadi When I try to index Turkish page if there is a Turkish specific character in the HTML specific tag HTML parser gives Parse Aborted: Lexical error.on ... line error. For this case IMG SRC=../images/head.jpg WIDTH=570 HEIGHT=47 BORDER=0 ALT=ş exception address ş character (which has 351 ascii value) as an error. OR ı character in title tag. a title=(ııı) Turkish character in the content do not create any problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-851) Pruning
[ https://issues.apache.org/jira/browse/LUCENE-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914304#action_12914304 ] Robert Muir commented on LUCENE-851: Marvin: is this functionality addressed with LUCENE-2482 ? Pruning --- Key: LUCENE-851 URL: https://issues.apache.org/jira/browse/LUCENE-851 Project: Lucene - Java Issue Type: New Feature Components: Index, Search Reporter: Marvin Humphrey Priority: Minor Greets, A thread on java-dev a couple of months ago drew my attention to a technique used by Nutch for cutting down the number of hits that have to be processed: if you have an algorithm for ordering documents by importance, and you sort them so that the lowest document numbers have the highest rank, then most of your high-scoring hits are going to occur early on in the hit-collection process. Say you're looking for the top 100 matches -- the odds are pretty good that after you've found 1000 hits, you've gotten most of the good stuff. It may not be necessary to score the other e.g. 5,000,000 hits. To pull this off in Nutch, they run the index through a post process whereby documents are re-ordered by page score using the IndexSorter class. Unfortunately, post-processing does not live happily with incremental indexing. However, if we ensure that document numbers are ordered according to our criteria within each segment, that's almost as good. Say we're looking for 100 hits, as before; what we do is collect a maximum of 1000 hits per segment. If we are dealing with an index made up of 25 segments, that's 25,000 hits max we'll have to process fully -- the rest we can skip over. That's not as quick as only processing 1000 hits then stopping in a fully optimized index, but it's a lot better than churning through all 5,000,000 hits. A lot of those hits from the smallest segments will be garbage; we'll get most of our good hits from a few large segments most of the time. But that's fine -- the cost to process any one segment is small. Writing a low-level scoring loop which implements pruning per segment is straightforward. KinoSearch's version (in C) is below. To control the amount of pruning, we need a high-level Searcher.setPruneFactor API, which sets a multiplier; the number of hits-per-segment which must be processed is determined by multiplying the number of hits you need by pruneFactor. Here's code from KS for deriving hits-per-seg: # process prune_factor if supplied my $seg_starts; my $hits_per_seg = 2**31 - 1; if ( defined $self-{prune_factor} and defined $args{num_wanted} ) { my $prune_count = $self-{prune_factor} * $args{num_wanted}; if ( $prune_count $hits_per_seg ) {# don't exceed I32_MAX $hits_per_seg = $prune_count; $seg_starts = $reader-get_seg_starts; } } What I have not yet written is the index-time mechanism for sorting documents. In Nutch, they use the norms from a known indexed, non-tokenized field (site). However, in Lucene and KS, we can't count on any existing fields. Document boost isn't stored directly, either. The obvious answer is to start storing it, which would suffice for Nutch-like uses. However, it may make sense to to avoid coupling document ordering to boost in order to influence pruning without affecting scores. The sort ordering information needs a permanent home in the index, since it will be needed whenever segment merging occurs. The fixed-width per-document storage in Lucene's .fdx file seems like a good place. If we use one float per document, we can simply put it before or after the 64-bit file pointer and seek into the file after multiplying the doc num by 12 rather than 8. During indexing, we'd keep the ordering info in an array; after all documents for a segment have been added, we create an array of sorted document numbers. When flushing the postings, their document numbers get remapped using the sorted array. Then we rewrite the .fdx file (and also the .tvx file), moving the file pointers (and ordering info) to remapped locations. The fact that the .fdt file is now out of order is isn't a problem -- optimizing sequential access to that file isn't important. This issue is closely tied to LUCENE-843, Improve how IndexWriter uses RAM to buffer added documents, and LUCENE-847, Factor merge policy out of IndexWriter. Michael McCandless, Steven Parks, Ning Li, anybody else... comments? Suggestions? Marvin Humphrey Rectangular Research http://www.rectangular.com/ void Scorer_collect(Scorer *self, HitCollector *hc, u32_t start, u32_t end, u32_t hits_per_seg, VArray *seg_starts) {
[jira] Closed: (LUCENE-851) Pruning
[ https://issues.apache.org/jira/browse/LUCENE-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marvin Humphrey closed LUCENE-851. -- Resolution: Duplicate Yes, LUCENE-2482 introduces the index sorter from Nutch that I referred to in this issue. The termination mechanism is slightly different (TimeLimitedCollector vs. X hits per segment), but it's the sorter that really matters. I'm closing as Duplicate. Thanks for digging this up! Pruning --- Key: LUCENE-851 URL: https://issues.apache.org/jira/browse/LUCENE-851 Project: Lucene - Java Issue Type: New Feature Components: Index, Search Reporter: Marvin Humphrey Priority: Minor Greets, A thread on java-dev a couple of months ago drew my attention to a technique used by Nutch for cutting down the number of hits that have to be processed: if you have an algorithm for ordering documents by importance, and you sort them so that the lowest document numbers have the highest rank, then most of your high-scoring hits are going to occur early on in the hit-collection process. Say you're looking for the top 100 matches -- the odds are pretty good that after you've found 1000 hits, you've gotten most of the good stuff. It may not be necessary to score the other e.g. 5,000,000 hits. To pull this off in Nutch, they run the index through a post process whereby documents are re-ordered by page score using the IndexSorter class. Unfortunately, post-processing does not live happily with incremental indexing. However, if we ensure that document numbers are ordered according to our criteria within each segment, that's almost as good. Say we're looking for 100 hits, as before; what we do is collect a maximum of 1000 hits per segment. If we are dealing with an index made up of 25 segments, that's 25,000 hits max we'll have to process fully -- the rest we can skip over. That's not as quick as only processing 1000 hits then stopping in a fully optimized index, but it's a lot better than churning through all 5,000,000 hits. A lot of those hits from the smallest segments will be garbage; we'll get most of our good hits from a few large segments most of the time. But that's fine -- the cost to process any one segment is small. Writing a low-level scoring loop which implements pruning per segment is straightforward. KinoSearch's version (in C) is below. To control the amount of pruning, we need a high-level Searcher.setPruneFactor API, which sets a multiplier; the number of hits-per-segment which must be processed is determined by multiplying the number of hits you need by pruneFactor. Here's code from KS for deriving hits-per-seg: # process prune_factor if supplied my $seg_starts; my $hits_per_seg = 2**31 - 1; if ( defined $self-{prune_factor} and defined $args{num_wanted} ) { my $prune_count = $self-{prune_factor} * $args{num_wanted}; if ( $prune_count $hits_per_seg ) {# don't exceed I32_MAX $hits_per_seg = $prune_count; $seg_starts = $reader-get_seg_starts; } } What I have not yet written is the index-time mechanism for sorting documents. In Nutch, they use the norms from a known indexed, non-tokenized field (site). However, in Lucene and KS, we can't count on any existing fields. Document boost isn't stored directly, either. The obvious answer is to start storing it, which would suffice for Nutch-like uses. However, it may make sense to to avoid coupling document ordering to boost in order to influence pruning without affecting scores. The sort ordering information needs a permanent home in the index, since it will be needed whenever segment merging occurs. The fixed-width per-document storage in Lucene's .fdx file seems like a good place. If we use one float per document, we can simply put it before or after the 64-bit file pointer and seek into the file after multiplying the doc num by 12 rather than 8. During indexing, we'd keep the ordering info in an array; after all documents for a segment have been added, we create an array of sorted document numbers. When flushing the postings, their document numbers get remapped using the sorted array. Then we rewrite the .fdx file (and also the .tvx file), moving the file pointers (and ordering info) to remapped locations. The fact that the .fdt file is now out of order is isn't a problem -- optimizing sequential access to that file isn't important. This issue is closely tied to LUCENE-843, Improve how IndexWriter uses RAM to buffer added documents, and LUCENE-847, Factor merge policy out of IndexWriter. Michael McCandless, Steven Parks, Ning Li, anybody else... comments? Suggestions? Marvin Humphrey Rectangular Research http://www.rectangular.com/
[jira] Updated: (LUCENE-1840) QueryUtils should check that equals properly handles null
[ https://issues.apache.org/jira/browse/LUCENE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1840: Attachment: LUCENE-1840.patch QueryUtils should check that equals properly handles null - Key: LUCENE-1840 URL: https://issues.apache.org/jira/browse/LUCENE-1840 Project: Lucene - Java Issue Type: Improvement Components: Build Reporter: Mark Miller Priority: Trivial Attachments: LUCENE-1840.patch Its part of the equals contract, but many classes currently violate -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1568) Implement Spatial Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914319#action_12914319 ] Yonik Seeley commented on SOLR-1568: Hmmm, I'm not understanding the purpose of the checks for if we cross the equator, and addEquatorialBoundary(). It looks like it creates two ranges... [min TO 0] and [0 to max], where [min TO max] should always work. Is there something special about the equator I'm missing? Implement Spatial Filter Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1, 4.0 Attachments: CartesianTierQParserPlugin.java, SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr. The interface _could_ look like: {code} fq={!sfilt dist=20}location:49.32,-79.0 {code} or it could be: {code} fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20} {code} or: {code} fq={!sfilt p=49.32,-79.0 f=location dist=20} {code} or: {code} fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1568) Implement Spatial Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914321#action_12914321 ] Yonik Seeley commented on SOLR-1568: I'm further confused by the following: {code} else if (ll[LONG] 0.0 ur[LONG] 0.0) {//prime meridian (0 degrees {code} I don't see what's special about crossing the 0 deg longitude line here. It seems like the only special-case for longitude is if ll ur, in which case you went over the +-180 line and need two ranges to cover that [ur TO -180] OR [ll TO 180] ? Implement Spatial Filter Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1, 4.0 Attachments: CartesianTierQParserPlugin.java, SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr. The interface _could_ look like: {code} fq={!sfilt dist=20}location:49.32,-79.0 {code} or it could be: {code} fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20} {code} or: {code} fq={!sfilt p=49.32,-79.0 f=location dist=20} {code} or: {code} fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1568) Implement Spatial Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914324#action_12914324 ] Yonik Seeley commented on SOLR-1568: FYI, I'm in the middle of refactoring this code, so please give feedback as comments, not patches for now... Implement Spatial Filter Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1, 4.0 Attachments: CartesianTierQParserPlugin.java, SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr. The interface _could_ look like: {code} fq={!sfilt dist=20}location:49.32,-79.0 {code} or it could be: {code} fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20} {code} or: {code} fq={!sfilt p=49.32,-79.0 f=location dist=20} {code} or: {code} fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1568) Implement Spatial Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914326#action_12914326 ] Chris Male commented on SOLR-1568: -- Yonik, I recommend you glance over [http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates] (which I think Bill mentioned earlier). In addition to proposing quite a nice bounding box algorithm, he also discusses how to handle the poles and the 180 degree line. I recommend we follow the behaviour that he suggests. Implement Spatial Filter Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1, 4.0 Attachments: CartesianTierQParserPlugin.java, SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr. The interface _could_ look like: {code} fq={!sfilt dist=20}location:49.32,-79.0 {code} or it could be: {code} fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20} {code} or: {code} fq={!sfilt p=49.32,-79.0 f=location dist=20} {code} or: {code} fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Fwd: distributed search on duplicate shards
Just wanted to poke this since it got buried under a dozen or so Jira updates. I also sent it to the deprecated list, though I think it should have forwarded. -mike -- Forwarded message -- From: mike anderson saidthero...@gmail.com Date: Thu, Sep 23, 2010 at 7:06 PM Subject: distributed search on duplicate shards To: solr-...@lucene.apache.org Hi all, My company is currently running a distributed Solr cluster with about 15 shards. We occasionally find that one shard will be relatively slow and thus hold up the entire response. To remedy this we thought it might be useful to have a system such that: 1. We can duplicate each shard, and thus have sets of shards, each with the same index 2. We can pass in these sets of shards along with the query (for instance, if ! is the delimiter, shards=solr1a!solr1b,solr2a!solr2b) 3. The request goes out to /all/ shards (unlike load balancing in Solr Cloud) 4. The first shard from a set (solr1a, solr1b) to successfully return is honored, and the other requests (solr1b, if solr1a responds first, for instance) are removed/ignored 5. The response is completed and returned as soon as one shard from each set responds I've written a patch to accomplish this, but have a few questions 1. What are the known disadvantages to such a strategy? (we've thought of a few, like sets being out of sync, but they don't bother us too much) 2. What would this type of a feature be called? This way I can open a Jira ticket for it 3. Is there a preferred way to do this? My current patch (wich I can post soon) works in the HTTPClient portion of SearchHandler. I keep a hash map of the shard sets and cancel the FutureShardResponse's in the corresponding set when each response comes back. Thanks in advance, Mike P.S I'd like to write a test for this feature but it wasn't clear from the distributed test how to do so. Could somebody point me in the right direction (an existing test, perhaps) for how to accomplish this?
Re: bulk change 'don't email'
On Thu, Sep 23, 2010 at 8:17 PM, Robert Muir rcm...@gmail.com wrote: didnt there used to be an option in jira not to email for bulk changes? it seems to be gone... i was trying to clean up jira some. sorry for the noise nonsense! I think we all appreciate your recent efforts! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: bulk change 'don't email'
As far as I know, if there is one issue in the bulk that has status “closed”, your possibilities are limited. Maybe that happened here. The last time I released Lucene (June), it was all possible without sending emails! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, September 23, 2010 5:17 PM To: dev@lucene.apache.org Subject: bulk change 'don't email' didnt there used to be an option in jira not to email for bulk changes? it seems to be gone... i was trying to clean up jira some. sorry for the noise -- Robert Muir rcm...@gmail.com
Hudson build is back to normal : Solr-3.x #112
See https://hudson.apache.org/hudson/job/Solr-3.x/112/changes - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1568) Implement Spatial Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-1568: --- Attachment: SOLR-1568.patch OK, so I didn't touch the current math, but I heavily re-factored (and fixed I hope) the logic. It breaks down like this: we always have exactly one latitude constraint, and between 0 and 2 longitude constraints. (0 when doing a polar cap, 1 normal, 2 when we cross the +-180 long line) The part of the code that calculates the ranges and builds the query is now much shorter (ignore the other work in progress, SpatialDistanceQuery, for now). Implement Spatial Filter Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1, 4.0 Attachments: CartesianTierQParserPlugin.java, SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr. The interface _could_ look like: {code} fq={!sfilt dist=20}location:49.32,-79.0 {code} or it could be: {code} fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20} {code} or: {code} fq={!sfilt p=49.32,-79.0 f=location dist=20} {code} or: {code} fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1568) Implement Spatial Filter
[ https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914340#action_12914340 ] Yonik Seeley commented on SOLR-1568: bq. I recommend we follow the behaviour that he suggests. Looks like a really nice page. I don't have time to figure out and redo any math right now, but I hope in the future that we can have random tests to verify this stuff. Implement Spatial Filter Key: SOLR-1568 URL: https://issues.apache.org/jira/browse/SOLR-1568 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 3.1, 4.0 Attachments: CartesianTierQParserPlugin.java, SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr. The interface _could_ look like: {code} fq={!sfilt dist=20}location:49.32,-79.0 {code} or it could be: {code} fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20} {code} or: {code} fq={!sfilt p=49.32,-79.0 f=location dist=20} {code} or: {code} fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org