Re: Whither ORP?
Hello, I am new to ORP. I would like to contribute to the project. I do not have a lot of experience in this field of IR, crowd sourcing or AI. If someone could take the lead and set forward path I would be willing to contribute my skill set to ORP. How can I help? I have a lot of experience doing software development and system administration. Cheers, --Dan On Mon, Sep 13, 2010 at 1:36 PM, Omar Alonso oralo...@yahoo.com wrote: I think ORP is a great candidate for crowdsourcing/human computation. In the last year or so there's been quite a bit of research and applications on this. See the page for the SIGIR workshop on using crowdsourcing for IR evaluation: http://www.ischool.utexas.edu/~cse2010/http://www.ischool.utexas.edu/%7Ecse2010/ Omar --- On Mon, 9/13/10, Itamar Syn-Hershko ita...@code972.com wrote: From: Itamar Syn-Hershko ita...@code972.com Subject: Re: Whither ORP? To: openrelevance-dev@lucene.apache.org Date: Monday, September 13, 2010, 9:33 AM With the proper two-way open-source development process (taking and then giving) I think it can become an important part of open-IR technologies, just like what Lucene did to the search engines world. What ORP has to offer is of great interest to HebMorph, an open-source project of mine trying to decide on what is the best way to index and search Hebrew texts. To this end I decided to put some of the development efforts of the HebMorph project into making tools for the ORP. I have announced this before, but unfortunately I had to attend to more pressing tasks before I could complete this (and there was no response from the community anyway...). Just in case you're interested in seeing what I came up with so far: http://github.com/synhershko/Orev. IMHO, the ORP should stand by itself, and relate to Lucene/Solr only as its basis framework for these initial stages. Perhaps also try to attract more people who could find an interest in what it has to offer, so it can really start growing. Itamar. On 12/9/2010 1:29 PM, Grant Ingersoll wrote: On Sep 11, 2010, at 8:51 PM, Robert Muir wrote: i propose we take what we have and import into lucene-java's benchmark contrib. it already has integration with wikipedia and reuters for perf purposes, and the quality package is actually there anyways. later, maybe more people have time and contrib/benchmark evolves naturally... e.g. to modules/benchmark with solr support as a first big step. Yeah, that seems reasonable. I have been thinking lately that it might be useful to pull our DocMaker stuff out separately from benchmark so that people have easy ways of generating content from things like Wikipedia, etc. Still, at the end of the day, I like what ORP _could_ bring to the table and to some extent I think that is lost by folding it into Lucene benchmark. On Sep 11, 2010 7:33 PM, Grant Ingersollgsing...@apache.org wrote: Seems ORP isn't really catching on with people. I know personally I don't have the time I had hoped to have to get it going. At the same time, I really think it could be a good project. We've got some tools put together, but we still haven't done much about the bigger goal of a self contained evaluation. Any thoughts on how we should proceed with ORP? -Grant
unsubscribe
unsubscribe
NRT SOLR-1617 on branch_3x
Hi, Starting a new thread for this question, as it's a little separate from the 3.x cache warming thread: http://www.lucidimagination.com/search/document/fdc19610bdf6e179/tuning_solr_caches_with_high_commit_rates_nrt#adf9ee007ce18ba6 The new fcs/per-segment faceting that's in the 4.x trunk looks like a hugely useful feature. Would it be possible to get/port this for branch_3x? I'm happy to help any way I can in porting this cool new feature. Many Thanks, Peter - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908733#action_12908733 ] Uwe Schindler commented on LUCENE-2642: --- Why not simply extend the Assert abstract class? This would remove use of deprecated old JUnit3 Framework completely? merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908735#action_12908735 ] Robert Muir commented on LUCENE-2642: - bq. Why not simply extend the Assert abstract class? This would remove use of deprecated old JUnit3 Framework completely? I would like to do this under a different issue. We cannot do this, until all assertEquals(float, float) are changed to use epsilons, for example. By extending Assert, we can catch all the places we don't use epsilons and fix them, which is a great improvement, but out of scope of this issue. merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908733#action_12908733 ] Uwe Schindler edited comment on LUCENE-2642 at 9/13/10 8:02 AM: Why not simply extend the org.junit.Assert class? This would remove use of deprecated old JUnit3 Framework completely? was (Author: thetaphi): Why not simply extend the Assert abstract class? This would remove use of deprecated old JUnit3 Framework completely? merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: exceptions from solr/contrib/dataimporthandler and solr/contrib/extraction
On Sep 12, 2010, at 2:47 PM, Yonik Seeley wrote: On Sun, Sep 12, 2010 at 2:35 PM, Grant Ingersoll gsing...@apache.org wrote: contrib/extraction prints an 'unknown field a' exception The printing of this could be commented out, possibly. The exception being thrown is proper operation, as the field a does not exist. Yeah, but the bigger question is if field a should have been generated given the specific extraction configuration. If it's explicitly testing for an exception, then we can ignore it (add it to the ignores list)... otherwise it's either a test bug or a Yes, it should be added to the ignores list. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908739#action_12908739 ] Robert Muir commented on LUCENE-2642: - bq. I am just afraid of extending form the old JUnit Testcase. we already extend this! Have you looked at LuceneTestCase lately? bq. So extend Assert and the add missing static methods for compatibility. Please, i would like to keep the epsilon stuff out of this issue. All tests pass the way it is now, there is no problem. We can fix epsilons in a followup issue, and then use no junit3 code at all... as I said its a great improvement, but not necessary to mix in with this change. By doing both at once, if somethign goes wrong, it will be more difficult to debug. Lets keep the scope under control. merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: exceptions from solr/contrib/dataimporthandler and solr/contrib/extraction
On Mon, Sep 13, 2010 at 8:05 AM, Grant Ingersoll gsing...@apache.orgwrote: Yes, it should be added to the ignores list. Thanks Grant! -- Robert Muir rcm...@gmail.com
[jira] Updated: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2642: -- Attachment: LUCENE-2642-extendAssert.patch Here the patch, so LuceneTestCaseJ4 only extends Assert without importing extra crap. The double/float API of old Junit3 is emulated using static overrides. After applying patch do a ant clean, else you will get lining errors. merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexReader Cache - a different angle
i created https://issues.apache.org/jira/browse/LUCENE-2345 some time ago proposing pretty much what seems to be discussed here -- Tim On 09/12/10 10:18, Simon Willnauer wrote: On Sun, Sep 12, 2010 at 11:46 AM, Michael McCandless luc...@mikemccandless.com wrote: Having hooks to enable an app to manage its own external, private stuff associated w/ each segment reader would be useful and it's been asked for in the past. However, since we've now opened up SegmentReader, SegmentInfo/s, etc., in recent releases, can't an app already do this w/o core API changes? The visitor approach would simply be a little more than syntactic sugar where only new SubReader instances are passed to the callback. You can do the same with the already existing API like gatherSubReaders or getSequentialSubReaders. Every API I was talking about would just be simplification anyway and would be possible to build without changing the core. I know Earwin has built a whole system like this on top of Lucene -- Earwin how did you do that...? Did you make core changes to Lucene...? A custom Codec should be an excellent way to handle the specific use cache (caching certain postings) -- by doing it as a Codec, any time anything in Lucene needs to tap into that posting (query scorers, filters, merging, applying deletes, etc), it hits this cache. You could model it like PulsingCodec, which wraps any other Codec but handles the low-freq ones itself. If you do it externally how would core use of postings hit it? (Or was that not the intention?) I don't understand the filter use-case... the CachingWrapperFilter already caches per-segment, so that reopen is efficient? How would an external cache (built on these hooks) be different? Man you are right - never mind :) simon For faster filters we have to apply them like we do deleted docs if the filter is random access (such as being cached), LUCENE-1536 -- flex actually makes this relatively easy now, since the postings API no longer implicitly filters deleted docs (ie you provide your own skipDocs) -- but these hooks won't fix that right? Mike On Sun, Sep 12, 2010 at 3:43 AM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Shai, On Sun, Sep 12, 2010 at 6:51 AM, Shai Ereraser...@gmail.com wrote: Hey Simon, You're right that the application can develop a Caching mechanism outside Lucene, and when reopen() is called, if it changed, iterate on the sub-readers and init the Cache w/ the new ones. Alright, then we are on the same track I guess! However, by building something like that inside Lucene, the application will get more native support, and thus better performance, in some cases. For example, consider a field fileType with 10 possible values, and for the sake of simplicity, let's say that the index is divided evenly across them. Your users always add such a term constraint to the query (e.g. they want to get results of fileType:pdf or fileType:odt, and perhaps sometimes both, but not others). You have basically two ways of supporting this: (1) Add such a term to the query / clause to a BooleanQuery w/ an AND relation -- cons is that this term / posting is read for every query. Oh I wasn't saying that a cache framework would be obsolet and shouldn't be part of lucene. My intention was rather to generalize this functionality so that we can make the API change more easily and at the same time brining the infrastructure you are proposing in place. Regarding you example above, filters are a very good example where something like that could help to improve performance and we should provide it with lucene core but I would again prefer the least intrusive way to do so. If we can make that happen without adding any cache agnostic API we should do it. We really should try to sketch out a simple API with gives us access to the opened segReaders and see if that would be sufficient for our usecases. Specialization will always be possible but I doubt that it is needed. (2) Write a Filter which works at the top IR level, that is refreshed whenever the index is refreshed. This is better than (1), however has some disadvantages: (2.1) As Mike already proved (on some issue which I don't remember its subject/number at the moment), if we could get Filter down to the lower level components of Lucene's search, so e.g. it is used as the deleted docs OBS, we can get better performance w/ Filters. (2.2) The Filter is refreshed for the entire IR, and not just the changed segments. Reason is, outside Collector, you have no way of telling IndexSearcher use Filter F1 for segment S1 and F2 for segment F2. Loading/refreshing the Filter may be expensive, and definitely won't perform well w/ NRT, where by definition you'd like to get small changes searchable very fast. No doubt you are right about the above. A PerSegmentCachingFilterWrapper would be something we can easily do on an application level basis with the infrastructure we are talking about in place. While I don't exactly
[jira] Updated: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2642: Attachment: LUCENE-2642.patch updated patch, with all of lucene/solr and including uwe's stuff. all tests pass. merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexReader Cache - a different angle
I'd second that In my usecase we need to search, sometimes with sort, on pretty big index... So in worst case scenario we get OOM while loading FieldCache as it tries do create an huge array. You can increase -Xmx, go to bigger host, but in the end there WILL be an index big enough to crash you. My idea would be to use something like EhCache with few elements in memory and overflow to disk, so that if there are few unique terms, it would be almost as fast as an array. Otherwise in Collector/Sort/SortField/FieldComparator I would hit the EhCache on disk (yes it would be a huge performance hit) but I won't get OOMs and the results STILL will be sorted. Right now SegmentReader/FieldCacheImpl are pretty hardcoded on FieldCache.DEFAULT And yes, I'm on 3.x... On Mon, Sep 13, 2010 at 16:05, Tim Smith tsm...@attivio.com wrote: i created https://issues.apache.org/jira/browse/LUCENE-2345 some time ago proposing pretty much what seems to be discussed here -- Tim On 09/12/10 10:18, Simon Willnauer wrote: On Sun, Sep 12, 2010 at 11:46 AM, Michael McCandless luc...@mikemccandless.com wrote: Having hooks to enable an app to manage its own external, private stuff associated w/ each segment reader would be useful and it's been asked for in the past. However, since we've now opened up SegmentReader, SegmentInfo/s, etc., in recent releases, can't an app already do this w/o core API changes? The visitor approach would simply be a little more than syntactic sugar where only new SubReader instances are passed to the callback. You can do the same with the already existing API like gatherSubReaders or getSequentialSubReaders. Every API I was talking about would just be simplification anyway and would be possible to build without changing the core. I know Earwin has built a whole system like this on top of Lucene -- Earwin how did you do that...? Did you make core changes to Lucene...? A custom Codec should be an excellent way to handle the specific use cache (caching certain postings) -- by doing it as a Codec, any time anything in Lucene needs to tap into that posting (query scorers, filters, merging, applying deletes, etc), it hits this cache. You could model it like PulsingCodec, which wraps any other Codec but handles the low-freq ones itself. If you do it externally how would core use of postings hit it? (Or was that not the intention?) I don't understand the filter use-case... the CachingWrapperFilter already caches per-segment, so that reopen is efficient? How would an external cache (built on these hooks) be different? Man you are right - never mind :) simon For faster filters we have to apply them like we do deleted docs if the filter is random access (such as being cached), LUCENE-1536 -- flex actually makes this relatively easy now, since the postings API no longer implicitly filters deleted docs (ie you provide your own skipDocs) -- but these hooks won't fix that right? Mike On Sun, Sep 12, 2010 at 3:43 AM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Shai, On Sun, Sep 12, 2010 at 6:51 AM, Shai Ereraser...@gmail.com wrote: Hey Simon, You're right that the application can develop a Caching mechanism outside Lucene, and when reopen() is called, if it changed, iterate on the sub-readers and init the Cache w/ the new ones. Alright, then we are on the same track I guess! However, by building something like that inside Lucene, the application will get more native support, and thus better performance, in some cases. For example, consider a field fileType with 10 possible values, and for the sake of simplicity, let's say that the index is divided evenly across them. Your users always add such a term constraint to the query (e.g. they want to get results of fileType:pdf or fileType:odt, and perhaps sometimes both, but not others). You have basically two ways of supporting this: (1) Add such a term to the query / clause to a BooleanQuery w/ an AND relation -- cons is that this term / posting is read for every query. Oh I wasn't saying that a cache framework would be obsolet and shouldn't be part of lucene. My intention was rather to generalize this functionality so that we can make the API change more easily and at the same time brining the infrastructure you are proposing in place. Regarding you example above, filters are a very good example where something like that could help to improve performance and we should provide it with lucene core but I would again prefer the least intrusive way to do so. If we can make that happen without adding any cache agnostic API we should do it. We really should try to sketch out a simple API with gives us access to the opened segReaders and see if that would be sufficient for our usecases. Specialization will always be possible but I doubt that it is needed. (2) Write a Filter which works at the top IR level, that is refreshed whenever the index is
Re: IndexReader Cache - a different angle
And it would be nice to have hooks in lucene and avoid managing refs to indexReader on reopen() and close() by myself. Oh...and to complicate things, my index it's near-realtime using IndexWriter.getReader(), so it's not just IndexReader we need to change, but also IndexWriter should provide a reader that has proper FieldCache implementation. And I'm a bit uncomfortable to dig that deep :) On Mon, Sep 13, 2010 at 17:51, Danil ŢORIN torin...@gmail.com wrote: I'd second that In my usecase we need to search, sometimes with sort, on pretty big index... So in worst case scenario we get OOM while loading FieldCache as it tries do create an huge array. You can increase -Xmx, go to bigger host, but in the end there WILL be an index big enough to crash you. My idea would be to use something like EhCache with few elements in memory and overflow to disk, so that if there are few unique terms, it would be almost as fast as an array. Otherwise in Collector/Sort/SortField/FieldComparator I would hit the EhCache on disk (yes it would be a huge performance hit) but I won't get OOMs and the results STILL will be sorted. Right now SegmentReader/FieldCacheImpl are pretty hardcoded on FieldCache.DEFAULT And yes, I'm on 3.x... On Mon, Sep 13, 2010 at 16:05, Tim Smith tsm...@attivio.com wrote: i created https://issues.apache.org/jira/browse/LUCENE-2345 some time ago proposing pretty much what seems to be discussed here -- Tim On 09/12/10 10:18, Simon Willnauer wrote: On Sun, Sep 12, 2010 at 11:46 AM, Michael McCandless luc...@mikemccandless.com wrote: Having hooks to enable an app to manage its own external, private stuff associated w/ each segment reader would be useful and it's been asked for in the past. However, since we've now opened up SegmentReader, SegmentInfo/s, etc., in recent releases, can't an app already do this w/o core API changes? The visitor approach would simply be a little more than syntactic sugar where only new SubReader instances are passed to the callback. You can do the same with the already existing API like gatherSubReaders or getSequentialSubReaders. Every API I was talking about would just be simplification anyway and would be possible to build without changing the core. I know Earwin has built a whole system like this on top of Lucene -- Earwin how did you do that...? Did you make core changes to Lucene...? A custom Codec should be an excellent way to handle the specific use cache (caching certain postings) -- by doing it as a Codec, any time anything in Lucene needs to tap into that posting (query scorers, filters, merging, applying deletes, etc), it hits this cache. You could model it like PulsingCodec, which wraps any other Codec but handles the low-freq ones itself. If you do it externally how would core use of postings hit it? (Or was that not the intention?) I don't understand the filter use-case... the CachingWrapperFilter already caches per-segment, so that reopen is efficient? How would an external cache (built on these hooks) be different? Man you are right - never mind :) simon For faster filters we have to apply them like we do deleted docs if the filter is random access (such as being cached), LUCENE-1536 -- flex actually makes this relatively easy now, since the postings API no longer implicitly filters deleted docs (ie you provide your own skipDocs) -- but these hooks won't fix that right? Mike On Sun, Sep 12, 2010 at 3:43 AM, Simon Willnauer simon.willna...@googlemail.com wrote: Hey Shai, On Sun, Sep 12, 2010 at 6:51 AM, Shai Ereraser...@gmail.com wrote: Hey Simon, You're right that the application can develop a Caching mechanism outside Lucene, and when reopen() is called, if it changed, iterate on the sub-readers and init the Cache w/ the new ones. Alright, then we are on the same track I guess! However, by building something like that inside Lucene, the application will get more native support, and thus better performance, in some cases. For example, consider a field fileType with 10 possible values, and for the sake of simplicity, let's say that the index is divided evenly across them. Your users always add such a term constraint to the query (e.g. they want to get results of fileType:pdf or fileType:odt, and perhaps sometimes both, but not others). You have basically two ways of supporting this: (1) Add such a term to the query / clause to a BooleanQuery w/ an AND relation -- cons is that this term / posting is read for every query. Oh I wasn't saying that a cache framework would be obsolet and shouldn't be part of lucene. My intention was rather to generalize this functionality so that we can make the API change more easily and at the same time brining the infrastructure you are proposing in place. Regarding you example above, filters are a very good example where something like that could help to improve performance and we should provide it
[jira] Commented: (SOLR-2112) Solrj should support streaming response
[ https://issues.apache.org/jira/browse/SOLR-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908838#action_12908838 ] Ryan McKinley commented on SOLR-2112: - I would like to commit this soon (just to /trunk) unless there are objections Solrj should support streaming response --- Key: SOLR-2112 URL: https://issues.apache.org/jira/browse/SOLR-2112 Project: Solr Issue Type: New Feature Components: clients - java Reporter: Ryan McKinley Fix For: 4.0 Attachments: SOLR-2112-StreamingSolrj.patch, SOLR-2112-StreamingSolrj.patch The solrj API should optionally support streaming documents. Rather then putting all results into a SolrDocumentList, sorlj should be able to call a callback function after each document is parsed. This would allow someone to call query.setRows( Integer.MAX_INT ) and get each result to the client without loading them all into memory. For starters, I think the important things to stream are SolrDocuments, but down the road, this could also stream other things (consider reading all terms from the index) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: exceptions from solr/contrib/dataimporthandler and solr/contrib/extraction
On Sun, Sep 12, 2010 at 8:55 PM, Lance Norskog goks...@gmail.com wrote: And this is why unit tests shouldn't spew- just yes or no, please. If you want to patch this, please comment out all the printed trash, since all it does is cause mail threads like this. Also, people shouldn't put bugs in their code :-P -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations
[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908849#action_12908849 ] Jason Rutherglen commented on LUCENE-2575: -- One thing I noticed, correct me if I'm wrong, is the term doc frequency (the one stored per term, ie, TermsEnum.docFreq) doesn't seem to be currently recorded in the ram buffer code tree. It will be easy to add, though if we make it accurate per RAM index reader then we could be allocating a unique array, the length of the number of terms, per reader. I'll implement it this way to start and we can change it later if necessary. Actually, to save RAM this could be another use case where a 2 dimensional copy-on-write array is practical. Concurrent byte and int block implementations - Key: LUCENE-2575 URL: https://issues.apache.org/jira/browse/LUCENE-2575 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Fix For: Realtime Branch Attachments: LUCENE-2575.patch, LUCENE-2575.patch The current *BlockPool implementations aren't quite concurrent. We really need something that has a locking flush method, where flush is called at the end of adding a document. Once flushed, the newly written data would be available to all other reading threads (ie, postings etc). I'm not sure I understand the slices concept, it seems like it'd be easier to implement a seekable random access file like API. One'd seek to a given position, then read or write from there. The underlying management of byte arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908856#action_12908856 ] Michael McCandless commented on LUCENE-2642: This is great Robert! Patch works for me (except for bizarre hang in Solr's TestSolrCoreProperties, apparently only on my machine, that's pre-existing). merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908861#action_12908861 ] Uwe Schindler commented on LUCENE-2642: --- Looks good, this is a really good step forwards. We can write old-style tests, butuse JUnit4 and can optionally add the @BeforeClass and so on :-) merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908866#action_12908866 ] Robert Muir commented on LUCENE-2642: - bq. We can write old-style tests, butuse JUnit4 and can optionally add the @BeforeClass and so on Yeah i've never understood why Junit4 requires all these static imports and annotations... i just care about @BeforeClass! merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
ant clean required
heads up: due to https://issues.apache.org/jira/browse/LUCENE-2642 you will need to run 'ant clean' after you svn up. -- Robert Muir rcm...@gmail.com
Re: exceptions from solr/contrib/dataimporthandler and solr/contrib/extraction
What I want you to do is, I want you to find the guys who are putting all the bugs in the code, and I want you to FIRE THEM! - true On Mon, Sep 13, 2010 at 9:10 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Sun, Sep 12, 2010 at 8:55 PM, Lance Norskog goks...@gmail.com wrote: And this is why unit tests shouldn't spew- just yes or no, please. If you want to patch this, please comment out all the printed trash, since all it does is cause mail threads like this. Also, people shouldn't put bugs in their code :-P -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Lance Norskog goks...@gmail.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2642: -- Attachment: LUCENE-2642-fixes.patch Some small fixes in reflection inspection: - exclude static and abstract methods - check native return type merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642-fixes.patch, LUCENE-2642.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908890#action_12908890 ] Robert Muir commented on LUCENE-2642: - thanks Uwe, i can merge this into the 3x backport too. merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642-fixes.patch, LUCENE-2642.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2112) Solrj should support streaming response
[ https://issues.apache.org/jira/browse/SOLR-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908894#action_12908894 ] Yonik Seeley commented on SOLR-2112: Can StreamingResponseCallback be an abstract class for easier back compat? I imagine we could want to stream other stuff in the future (output from terms component, facet component, term vector component, etc). Solrj should support streaming response --- Key: SOLR-2112 URL: https://issues.apache.org/jira/browse/SOLR-2112 Project: Solr Issue Type: New Feature Components: clients - java Reporter: Ryan McKinley Fix For: 4.0 Attachments: SOLR-2112-StreamingSolrj.patch, SOLR-2112-StreamingSolrj.patch The solrj API should optionally support streaming documents. Rather then putting all results into a SolrDocumentList, sorlj should be able to call a callback function after each document is parsed. This would allow someone to call query.setRows( Integer.MAX_INT ) and get each result to the client without loading them all into memory. For starters, I think the important things to stream are SolrDocuments, but down the road, this could also stream other things (consider reading all terms from the index) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908896#action_12908896 ] Robert Muir commented on LUCENE-2642: - OK, i didnt merge the reflection fixes yet, but i backported the patch to 3x. Committed revision 996611, 996630 (3x). Will mark the issue resolved when Uwe is out of reflection hell :) merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642-fixes.patch, LUCENE-2642.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2638) Make HighFreqTerms.TermStats class public
[ https://issues.apache.org/jira/browse/LUCENE-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908904#action_12908904 ] Tom Burton-West commented on LUCENE-2638: - Just wondering if you could describe the use you have in mind. Tom Make HighFreqTerms.TermStats class public - Key: LUCENE-2638 URL: https://issues.apache.org/jira/browse/LUCENE-2638 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Andrzej Bialecki Attachments: LUCENE-2638.patch It's not possible to use public methods in contrib/misc/... /HighFreqTerms from outside the package because the return type has package visibility. I propose to move TermStats class to a separate file and make it public. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-2504: - Attachment: LUCENE-2504.patch OK, here's a patch for solr's sort missing last. Median response time in my tests drops from 160 to 102 ms. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1536: --- Attachment: CachedFilterIndexReader.java By subclassing FilterIndexReader, and taking advantage of how the flex APIs now let you pass a custom skipDocs when pulling the postings, I created a prototype class (attached, named CachedFilterIndexReader) that up-front compiles the deleted docs for each segment with the negation of a Filter (that you provide), and returns a reader that applies that filter. This is nice because it's fully external to Lucene, and it gives awesome gains in many cases (see http://chbits.blogspot.com/2010/09/fast-search-filters-using-flex.html). I don't think we should commit this class -- we should instead fix Filters correctly! But it's a nice workaround until we do that. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908918#action_12908918 ] Robert Muir commented on LUCENE-2504: - silly question, what does the bigString do? (just wondering if it should be U+10,U+10,... now that we use utf-8 order, depending what it does) sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908934#action_12908934 ] Yonik Seeley commented on LUCENE-2504: -- bq. silly question, what does the bigString do? It's actually not currently used by Solr but it's basically to use as a proxy for a null if you want the Comparables returned by value() to match the sort order the Comparator actually used. bq. (just wondering if it should be U+10,U+10,... now that we use utf-8 order, depending what it does) Maybe... if it is supposed to be just a string (I know that's the name, but maybe it should be called bigTerm I guess). All of our terms are currently UTF8 - but I don't know if that will last? sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908947#action_12908947 ] Robert Muir commented on LUCENE-2504: - bq. Maybe... if it is supposed to be just a string (I know that's the name, but maybe it should be called bigTerm I guess). All of our terms are currently UTF8 - but I don't know if that will last? well you are right, for example collated terms for Locale-sensitive sort will hopefully use full byte range soon... we can always safely use bytes of 0xff i think? sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908953#action_12908953 ] Yonik Seeley commented on LUCENE-2504: -- bq. we can always safely use bytes of 0xff i think? Yep, should be fine. Is there an upper bound on how long collated terms can be? sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908956#action_12908956 ] Robert Muir commented on LUCENE-2504: - bq. Yep, should be fine. Is there an upper bound on how long collated terms can be? There isn't, but... I can't promise (but i'll verify), i think actually a single 0xff might do, for the major encodings. * its invalid in utf-8 * its technically valid, but unused (reset byte) in bocu-1 * collation keys i understand are a modified bocu, likely unused there too. so its like a NaN sentinel, if someone is doing something very wierd, maybe it wont work, but in general, i think it will work. ill check. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[IGNORE] Email test
Hi all, Sorry for the spam; testing to see if my email still works! Thanks, -- George
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908963#action_12908963 ] Yonik Seeley commented on LUCENE-2504: -- OK, I've changed bigString to bigTerm and used 10 0xff bytes (to account for possible binary encoding of 8 byte numerics + other stuff like tags that trie encoding uses). sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12908965#action_12908965 ] Robert Muir commented on LUCENE-2504: - Cool thanks (unfortunately i ran a bunch of collators and encountered what looks like 0xff bytes) I think this will help. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2112) Solrj should support streaming response
[ https://issues.apache.org/jira/browse/SOLR-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-2112: Attachment: SOLR-2112-StreamingSolrj.patch ah yes, good point. Here is an updated patch using: {code:java} public abstract class StreamingResponseCallback { /* * Called for each SolrDocument in the response */ public abstract void streamSolrDocument( SolrDocument doc ); /* * Called at the beginning of each DocList (and SolrDocumentList) */ public abstract void streamDocListInfo( long numFound, long start, Float maxScore ); } {code} Solrj should support streaming response --- Key: SOLR-2112 URL: https://issues.apache.org/jira/browse/SOLR-2112 Project: Solr Issue Type: New Feature Components: clients - java Reporter: Ryan McKinley Fix For: 4.0 Attachments: SOLR-2112-StreamingSolrj.patch, SOLR-2112-StreamingSolrj.patch, SOLR-2112-StreamingSolrj.patch The solrj API should optionally support streaming documents. Rather then putting all results into a SolrDocumentList, sorlj should be able to call a callback function after each document is parsed. This would allow someone to call query.setRows( Integer.MAX_INT ) and get each result to the client without loading them all into memory. For starters, I think the important things to stream are SolrDocuments, but down the road, this could also stream other things (consider reading all terms from the index) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2642) merge LuceneTestCase and LuceneTestCaseJ4
[ https://issues.apache.org/jira/browse/LUCENE-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-2642. - Resolution: Fixed OK i merged back all of Uwe's improvements. Thanks for the help Uwe. I think now in future issues we can clean up and improve this test case a lot. I felt discouraged from doing so with the previous duplication... merge LuceneTestCase and LuceneTestCaseJ4 - Key: LUCENE-2642 URL: https://issues.apache.org/jira/browse/LUCENE-2642 Project: Lucene - Java Issue Type: Test Components: Tests Reporter: Robert Muir Fix For: 3.1, 4.0 Attachments: LUCENE-2642-extendAssert.patch, LUCENE-2642-fixes.patch, LUCENE-2642.patch, LUCENE-2642.patch We added Junit4 support, but as a separate test class. So unfortunately, we have two separate base classes to maintain: LuceneTestCase and LuceneTestCaseJ4. This creates a mess and is difficult to manage. Instead, I propose a single base test class that works both junit3 and junit4 style. I modified our LuceneTestCaseJ4 in the following way: * the methods to run are not limited to the ones annotated with @Test, but also any void no-arg methods that start with test, like junit3. this means you dont have to sprinkle @Test everywhere. * of course, @Ignore works as expected everywhere. * LuceneTestCaseJ4 extends TestCase so you dont have to import static Assert.* to get all the asserts. for most tests, no changes are required. but a few very minor things had to be changed: * setUp() and tearDown() must be public, not protected. * useless ctors must be removed, such as TestFoo(String name) { super(name); } * LocalizedTestCase is gone, instead of {code} public class TestQueryParser extends LocalizedTestCase { {code} it is now {code} @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) public class TestQueryParser extends LuceneTestCase { {code} * Same with MultiCodecTestCase: (LuceneTestCase.MultiCodecTestCaseRunner.class} I only did the core tests in the patch as a start, and i just made an empty LuceneTestCase extends LuceneTestCaseJ4. I'd like to do contrib and solr and rename this LuceneTestCaseJ4 to only a single class: LuceneTestCase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2112) Solrj should support streaming response
[ https://issues.apache.org/jira/browse/SOLR-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley resolved SOLR-2112. - Resolution: Fixed added in r996693 I'm not sure what the 3.x release schedule looks like... so i'm not sure if back porting makes sense. I think keeping it on /trunk for a while makes sense till we know this is the API we want. Solrj should support streaming response --- Key: SOLR-2112 URL: https://issues.apache.org/jira/browse/SOLR-2112 Project: Solr Issue Type: New Feature Components: clients - java Reporter: Ryan McKinley Fix For: 4.0 Attachments: SOLR-2112-StreamingSolrj.patch, SOLR-2112-StreamingSolrj.patch, SOLR-2112-StreamingSolrj.patch The solrj API should optionally support streaming documents. Rather then putting all results into a SolrDocumentList, sorlj should be able to call a callback function after each document is parsed. This would allow someone to call query.setRows( Integer.MAX_INT ) and get each result to the client without loading them all into memory. For starters, I think the important things to stream are SolrDocuments, but down the road, this could also stream other things (consider reading all terms from the index) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-792) Tree Faceting Component
[ https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-792: --- Attachment: SOLR-792-PivotFaceting.patch updated to trunk Tree Faceting Component --- Key: SOLR-792 URL: https://issues.apache.org/jira/browse/SOLR-792 Project: Solr Issue Type: New Feature Reporter: Erik Hatcher Assignee: Erik Hatcher Priority: Minor Attachments: SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch A component to do multi-level faceting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909013#action_12909013 ] Scott Kister commented on SOLR-1297: Bug report on this feature, if there is a wildcard entry in schema.xml such as the following. !-- Ignore any fields that don't already match an existing field name -- dynamicField name=* type=ignored multiValued=true / then this feature does not work and an error is returned, ie GET 'http://localhost:8983/solr/select?q=*:*sort=sum(1,2)+asc' Error 400 can not sort on unindexed field: sum(1,2) Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5, 3.1, 4.0 Attachments: SOLR-1297-2.patch, SOLR-1297.patch, SOLR-1297.patch It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Assigned: (SOLR-792) Tree Faceting Component
[ https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher reassigned SOLR-792: - Assignee: Ryan McKinley (was: Erik Hatcher) handing this one over to Ryan, as I don't have cycles to work on it anytime soon. Rock on Ryan... Tree Faceting Component --- Key: SOLR-792 URL: https://issues.apache.org/jira/browse/SOLR-792 Project: Solr Issue Type: New Feature Reporter: Erik Hatcher Assignee: Ryan McKinley Priority: Minor Attachments: SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch A component to do multi-level faceting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
/trunk GRAVE: ConcurrentLRUCache was not destroyed prior to finalize()
On windows vista with JDK 1.6 running /trunk, I see warnings like this often: [junit] [junit] - Standard Error - [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! [junit] - --- Do others see this also? Is this new since the test reworking? $ java -version java version 1.6.0_20 Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: /trunk GRAVE: ConcurrentLRUCache was not destroyed prior to finalize()
On Mon, Sep 13, 2010 at 6:24 PM, Ryan McKinley ryan...@gmail.com wrote: On windows vista with JDK 1.6 running /trunk, I see warnings like this often: [junit] [junit] - Standard Error - [junit] 13-sep-2010 22:19:26 org.apache.solr.common.util.ConcurrentLRUCache finalize [junit] GRAVE: ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! I see this often as well on vista, for quite some time. -- Robert Muir rcm...@gmail.com
[jira] Commented: (SOLR-2116) TikaEntityProcessor does not find parser by default
[ https://issues.apache.org/jira/browse/SOLR-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909030#action_12909030 ] Lance Norskog commented on SOLR-2116: - It does not work if the parser= attribute is set to {code} parser=org.apache.tika.parser.AutoDetectParser {code} So, the AutoDetectParser does not work. Lance TikaEntityProcessor does not find parser by default --- Key: SOLR-2116 URL: https://issues.apache.org/jira/browse/SOLR-2116 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction) Affects Versions: 3.1, 4.0 Reporter: Lance Norskog Attachments: pdflist-data-config.xml, pdflist.xml The TikaEntityProcessor does not find the correct document parser by default. This is in a two-level DIH config file. I have attached pdflist-data-config.xml and pdflist.xml, the XML file list supplying. To test this, you will need the current 3.x branch or 4.0 trunk. # Set up a Tika-enabled Solr # copy any PDF file to /tmp/testfile.pdf # copy the pdflist-data-config.xml to your solr/conf # and add this snippet to your solrconfig.xml {code:xml} requestHandler name=/pdflist class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configpdflist-data-config.xml/str /lst /requestHandler {code} [http://localhost:8983/solr/pdflist?command=full-import] will make one document with the id and text fields populated. If you remove this line: {code} parser=org.apache.tika.parser.pdf.PDFParser {code} from the TikaEntityProcessor entity, the parser will not be found and you will get a document with the id field and nothing else. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1900) move Solr to flex APIs
[ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-1900: --- Attachment: SOLR-1900_bigTerm.txt Attaching patch that moves bigTerm into ByteUtils, adds BytesRef.append(BytesRef), and uses those in the faceting code when a prefix is specified (instead of a String with \u chars). If people think that the append() is more Solr specific (i.e. not likely to be used in lucene) I can move it to Solr's ByteUtils. move Solr to flex APIs -- Key: SOLR-1900 URL: https://issues.apache.org/jira/browse/SOLR-1900 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_bigTerm.txt, SOLR-1900_FileFloatSource.patch, SOLR-1900_termsComponent.txt Solr should use flex APIs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2119) IndexSchema should log warning if analyzer is declared with charfilter/tokenizer/tokenfiler out of order
IndexSchema should log warning if analyzer is declared with charfilter/tokenizer/tokenfiler out of order -- Key: SOLR-2119 URL: https://issues.apache.org/jira/browse/SOLR-2119 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Hoss Man There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter -- while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their analyzer in a way that doesn't make any sense. at the moment, some people are attempting to do things like move the Foo tokenFilter/ before the tokenizer/ to try and get certain behavior ... at a minimum we should log a warning in this case that doing that doesn't have the desired effect (we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting correct results that work for them, and breaking their instance on upgrade doens't seem like it would be productive) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909107#action_12909107 ] Yonik Seeley commented on LUCENE-2504: -- bq. Yonik, just curious, how do you know what HotSpot is doing? Empirically based on performance numbers? Yeah - it's a best guess based on what I see when performance testing, and matching that up with what I've read in the past. As far as deoptmization is concerned, it's mentioned here: http://java.sun.com/products/hotspot/whitepaper.html, but I haven't read much elsewhere. Specific to this issue, the whole optimization/deoptimization issue is extremely complex. Recall that I reported this: Median response time in my tests drops from 160 to 102 ms. For simplicity, there are some details I left out: Those numbers were for randomly sorting on different fields (hopefully the most realistic scenario). If you test differently, the results are far different. The first and second test runs measured median time sorting on a single field 100 times in a row, then moving to the next field. Trunk before patch: |unique terms in field|median sort time in ms (first run)|second run |10|105|168 |1|105|169 |1000|106|164 |100|127|163 |10|165|197 Trunk after patch: |unique terms in field|median sort time in ms (first run)|second run |10|85|130 |1|92|129 |1000|92|126 |100|116|127 |10|117|128 branch_3x |unique terms in field|median sort time in ms (first run)|second run |10|102|102 |1|102|103 |1000|101|103 |100|103|103 |10|118|118 So, it seems by running in batches (sorting on the same field over and over), we cause hotspot to overspecialize somehow, and then when we switch things up the resulting deoptimization puts us in a permanently worse condition). branch_3x does not suffer from that, but trunk still does due to the increased amount of indirection. I imagine the differences are also due to the boundaries that the compiler tries to inline/specialize for a certain class. It certainly complicates performance testing, and we need to keep a sharp eye on how we actually test potential improvements. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909116#action_12909116 ] Stephen Weiss commented on SOLR-236: FWIW, I fixed my earlier OOM issues with some garbage collection tuning. Now I'm noticing NPEs very similar to those people were reporting back before the patch from Jun 28th: SEVERE: java.lang.NullPointerException at org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser$FloatValueFieldComparator.compare(NonAdjacentDocumentCollapser.java:450) at org.apache.solr.search.fieldcollapse.NonAdjacentDocumentCollapser$DocumentComparator.compare(NonAdjacentDocumentCollapser.java:262) ... it's the same backtrace ... I'm guessing it's because I added those 5 lines back into the patch to get the paging working again. It's rather infrequent, it's probably something I can deal with until the new patch is complete. It doesn't happen every time at all like it seemed to happen to many people - just once in a while, and on queries that honestly run all the time, so it seems random and not related to a particular query (except perhaps in the size of the filter queries - these fqs relatively large #'s of documents). But if any of this code makes it to the new patch I thought it would be worth mentioning. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: Next Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, DocSetScoreCollector.java, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, quasidistributed.additional.patch, SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org