Now, a lost data problem with trunk too
Hi folks, It looks like the handle leak may be real - Simon Willnauer has been looking at it and could not find an explanation for the behavior I have been seeing. But before we got too far on that problem, I encountered what appears to be an even more serious problem. Specifically, I'm losing field data out of some records. The index I'm building is fairly large - some 25M records when complete. What I'm seeing is that the main searchable field (value) is not finding all the records it should. I was able to locate one such record just now: curl http://localhost:8983/solr/nose/standard?fl=*,scoreq=id:\POI|DEU:205:20187477:1014564|brandenburger+tor\ ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime95/intlst name=paramsstr name=qid:POI|DEU:205:20187477:1014564|brandenburger tor/strstr name=fl*,score/str/lst/lstresult name=response numFound=1 start=0 maxScore=17.335964docfloat name=score17.335964/floatstr name=entityidPOI|DEU:205:20187477:1014564|brandenburger tor/strstr name=idPOI|DEU:205:20187477:1014564|brandenburger tor/strstr name=referencebrandenburger tor, potsdam, deutschland/strstr name=typepoi/str ... /doc/result /response .. but it is completely missing the supposedly required value field: !-- The value field. This contains the actual string that will be matched.-- field name=value type=string_idx required=true stored=false/ The code that does the indexing is straightforward, and *some* of the records of this class are indeed searchable via the value field, but others aren't. I know the value field is non-empty, because it is used to construct the id field, which is correct above. Simon is also looking into this one, but if anyone else has advice for figuring out what's going wrong, please let me know. FWIW, this is a trunk build from Monday morning. Karl - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Now, a lost data problem with trunk too
On Tue, Sep 14, 2010 at 10:37 AM, karl.wri...@nokia.com wrote: Hi folks, It looks like the handle leak may be real - Simon Willnauer has been looking at it and could not find an explanation for the behavior I have been seeing. But before we got too far on that problem, I encountered what appears to be an even more serious problem. Specifically, I'm losing field data out of some records. The index I'm building is fairly large - some 25M records when complete. What I'm seeing is that the main searchable field (value) is not finding all the records it should. I was able to locate one such record just now: curl http://localhost:8983/solr/nose/standard?fl=*,scoreq=id:\POI|DEU:205:20187477:1014564|brandenburger+tor\ ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime95/intlst name=paramsstr name=qid:POI|DEU:205:20187477:1014564|brandenburger tor/strstr name=fl*,score/str/lst/lstresult name=response numFound=1 start=0 maxScore=17.335964docfloat name=score17.335964/floatstr name=entityidPOI|DEU:205:20187477:1014564|brandenburger tor/strstr name=idPOI|DEU:205:20187477:1014564|brandenburger tor/strstr name=referencebrandenburger tor, potsdam, deutschland/strstr name=typepoi/str ... /doc/result /response .. but it is completely missing the supposedly required value field: !-- The value field. This contains the actual string that will be matched.-- field name=value type=string_idx required=true stored=false/ that does not show up since it is not stored - maybe thats the reason :) simon The code that does the indexing is straightforward, and *some* of the records of this class are indeed searchable via the value field, but others aren't. I know the value field is non-empty, because it is used to construct the id field, which is correct above. Simon is also looking into this one, but if anyone else has advice for figuring out what's going wrong, please let me know. FWIW, this is a trunk build from Monday morning. Karl - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Fwd: Trunk file handle leak?
An update on this, the error was on my side doubly incrementing the searcher reference. no problem on trunk! simon On Fri, Sep 10, 2010 at 10:04 PM, Simon Rosenthal simon.rosent...@yahoo.com wrote: Karl: I reported something very similar a few months back and opened a Jira issue - see https://issues.apache.org/jira/browse/SOLR-1911. After I changed to a newer nightly build the leak went away. The issue is still open, so you mayw ant to update it. -Simon -- Forwarded message -- From: karl.wri...@nokia.com Date: Fri, Sep 10, 2010 at 3:24 PM Subject: RE: Trunk file handle leak? To: dev@lucene.apache.org, yo...@lucidimagination.com Hi Yonik, Be that as it may, I'm seeing a steady increase in file handles used by that process over an extended period of time (now 20+ minutes): r...@duck6:~# lsof -p 22379 | wc 786 7714 108339 r...@duck6:~# lsof -p 22379 | wc 787 7723 108469 r...@duck6:~# lsof -p 22379 | wc 787 7723 108469 r...@duck6:~# lsof -p 22379 | wc 812 7948 111719 r...@duck6:~# lsof -p 22379 | wc 816 7984 112239 r...@duck6:~# lsof -p 22379 | wc 817 7993 112369 r...@duck6:~# lsof -p 22379 | wc 822 8038 113019 r...@duck6:~# lsof -p 22379 | wc 847 8308 116719 r...@duck6:~# lsof -p 22379 | wc 852 8353 117369 r...@duck6:~# lsof -p 22379 | wc 897 8803 123669 r...@duck6:~# lsof -p 22379 | wc 1022 10018 140819 r...@duck6:~# This doesn't smell like spiky resource usage to me. It smells like a leak. ;-) Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Friday, September 10, 2010 2:08 PM To: dev@lucene.apache.org Subject: Re: Trunk file handle leak? On Fri, Sep 10, 2010 at 1:51 PM, karl.wri...@nokia.com wrote: (1) There are periodic commits, every 10,000 records. (2) I have no searcher/reader open at the same time, that I am aware of. This is a straight indexing task. (You ought to know, you wrote some of the code!) A commit currently opens a new searcher in Solr. It's not too hard to go past 1024 descriptors - either raise the limit to 10240 or something, use the compound file format, or lower the merge factor. (3) I *do* see auto warming being called, but it seems not to happen at the same time as a commit, but rather afterwards. Once it starts happening, this happens repeatedly on every commit. This would also be expected - it's at a point where there are too many files for your descriptors. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2106) Spelling Checking for Multiple Fields
[ https://issues.apache.org/jira/browse/SOLR-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909157#action_12909157 ] JAYABAALAN V commented on SOLR-2106: what is procedure to download the SOLR-2010.patch files into the exisiting Apache Solr v1.4 Spelling Checking for Multiple Fields - Key: SOLR-2106 URL: https://issues.apache.org/jira/browse/SOLR-2106 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4 Environment: Linux Environment Reporter: JAYABAALAN V Fix For: 1.4 Original Estimate: 0.02h Remaining Estimate: 0.02h Need to enable spellchecking for five different field and it's configuration.I am using dismax query parser for searching the different fields in the simple.If user has entered a wrong spelling in the front end.It should check in the five different fields and give collate spelling suggestion in the front end and should get a result based on the spelling suggestion.Do provide your configuration details for the same... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1900) move Solr to flex APIs
[ https://issues.apache.org/jira/browse/SOLR-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909160#action_12909160 ] Michael McCandless commented on SOLR-1900: -- I think it makes sense to move append to BytesRef, though I wonder if it should it over-allocate (ArrayUtil.oversize) when it grows? I realize for the current calls to append we don't need that (you just append bigTerm, once), but if someone uses this like a StringBuffer... though, this isn't really the intention of BytesRef, so maybe it's OK to not oversize. move Solr to flex APIs -- Key: SOLR-1900 URL: https://issues.apache.org/jira/browse/SOLR-1900 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-1900-facet_enum.patch, SOLR-1900-facet_enum.patch, SOLR-1900_bigTerm.txt, SOLR-1900_FileFloatSource.patch, SOLR-1900_termsComponent.txt Solr should use flex APIs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Now, a lost data problem with trunk too
Yes. Of course. My oversight. So I did the obvious thing and searched for the value field directly, and it is there: str name=idPOI|DEU:205:20187477:1014564|brandenburger tor/strstr name=languageger/strstr name=latitude52.39935/strstr name=longitude13.04793/strstr name=referencebrandenburger tor, potsdam, deutschland/str So, something about the way I am searching for it is not right. Looking elsewhere. Karl From: ext Simon Willnauer [simon.willna...@googlemail.com] Sent: Tuesday, September 14, 2010 4:52 AM To: dev@lucene.apache.org Subject: Re: Now, a lost data problem with trunk too On Tue, Sep 14, 2010 at 10:37 AM, karl.wri...@nokia.com wrote: Hi folks, It looks like the handle leak may be real - Simon Willnauer has been looking at it and could not find an explanation for the behavior I have been seeing. But before we got too far on that problem, I encountered what appears to be an even more serious problem. Specifically, I'm losing field data out of some records. The index I'm building is fairly large - some 25M records when complete. What I'm seeing is that the main searchable field (value) is not finding all the records it should. I was able to locate one such record just now: curl http://localhost:8983/solr/nose/standard?fl=*,scoreq=id:\POI|DEU:205:20187477:1014564|brandenburger+tor\ ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime95/intlst name=paramsstr name=qid:POI|DEU:205:20187477:1014564|brandenburger tor/strstr name=fl*,score/str/lst/lstresult name=response numFound=1 start=0 maxScore=17.335964docfloat name=score17.335964/floatstr name=entityidPOI|DEU:205:20187477:1014564|brandenburger tor/strstr name=idPOI|DEU:205:20187477:1014564|brandenburger tor/strstr name=referencebrandenburger tor, potsdam, deutschland/strstr name=typepoi/str ... /doc/result /response .. but it is completely missing the supposedly required value field: !-- The value field. This contains the actual string that will be matched.-- field name=value type=string_idx required=true stored=false/ that does not show up since it is not stored - maybe thats the reason :) simon The code that does the indexing is straightforward, and *some* of the records of this class are indeed searchable via the value field, but others aren't. I know the value field is non-empty, because it is used to construct the id field, which is correct above. Simon is also looking into this one, but if anyone else has advice for figuring out what's going wrong, please let me know. FWIW, this is a trunk build from Monday morning. Karl - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] Commented: (SOLR-2106) Spelling Checking for Multiple Fields
See: http://wiki.apache.org/solr/HowToContribute#Working_With_Patches http://wiki.apache.org/solr/HowToContribute#Working_With_PatchesErick On Tue, Sep 14, 2010 at 5:11 AM, JAYABAALAN V (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/SOLR-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909157#action_12909157] JAYABAALAN V commented on SOLR-2106: what is procedure to download the SOLR-2010.patch files into the exisiting Apache Solr v1.4 Spelling Checking for Multiple Fields - Key: SOLR-2106 URL: https://issues.apache.org/jira/browse/SOLR-2106 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.4 Environment: Linux Environment Reporter: JAYABAALAN V Fix For: 1.4 Original Estimate: 0.02h Remaining Estimate: 0.02h Need to enable spellchecking for five different field and it's configuration.I am using dismax query parser for searching the different fields in the simple.If user has entered a wrong spelling in the front end.It should check in the five different fields and give collate spelling suggestion in the front end and should get a result based on the spelling suggestion.Do provide your configuration details for the same... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909214#action_12909214 ] Robert Muir commented on LUCENE-2504: - {quote} Java (Oracle) really needs to do something to address this. {quote} I think we all owe it to ourselves to stop equating java with Sun/Oracle, if Java stays with Oracle its pretty obvious the language (is) will die anyway. {quote} I think this is a severe and growing problem for Lucene going forward - our search performance is crucial and we can't risk hotspot randomly, substantially slowing things down by alot. {quote} While I agree at the moment we should make efforts to work around issues like this, I don't think we should jump the gun and make real design/architectural choices based on Oracle bugs. Especially for trunk, by the time we release Lucene 4.0 some other company will probably own Java anyway. {quote} Not that we have a choice here... but I've often wondered whether .NET has this same hotspot fickleness problem {quote} .NET is not a choice but generating C/C++ code is? sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1682) Implement CollapseComponent
[ https://issues.apache.org/jira/browse/SOLR-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909223#action_12909223 ] Varun Gupta commented on SOLR-1682: --- Is there any workaround to use Highlight and Facet components along with grouping? Implement CollapseComponent --- Key: SOLR-1682 URL: https://issues.apache.org/jira/browse/SOLR-1682 Project: Solr Issue Type: Sub-task Components: search Reporter: Martijn van Groningen Assignee: Shalin Shekhar Mangar Fix For: Next Attachments: field-collapsing.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682.patch, SOLR-1682_prototype.patch, SOLR-1682_prototype.patch, SOLR-1682_prototype.patch, SOLR-236.patch Child issue of SOLR-236. This issue is dedicated to field collapsing in general and all its code (CollapseComponent, DocumentCollapsers and CollapseCollectors). The main goal is the finalize the request parameters and response format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909230#action_12909230 ] Simon Willnauer commented on LUCENE-2504: - bq. I think we all owe it to ourselves to stop equating java with Sun/Oracle, if Java stays with Oracle its pretty obvious the language (is) will die anyway. I agree with robert that we should stop comparing against sun jvms all the time and turn everything upside-down specializing code here and there or go one step further and generate C++ code. Dude who is gonna maintain the compatibility to Java-Only environments? I could imagine that we have something which is super special purpose like mike did with DirectNIOFSDirectory to work around unexposed methods like fadvice. I think that code specializations of very hot part of lucene are ok and we should follow that way like we did at some places but it already make things very complicated to follow. Without the knowledge of a committer or a person actively following that development it is extremely difficult to comprehend design decisions. I would rather like the idea to put effort in stuff like harmony and make code we can control perform better that introducing a preprocessor which generates code for a JVM owned by a company. Would it make way more sense to push OSS JVMs than spending lots of time on investigating on .NET as an alternative or C/C++ code generator? Before I would go the C++ path I'd rather use Java to host a C core like lucy which brings you as close as it gets to the machine. bq. EG, see my post here: interesting papers - seems we are touching the limits of Java though. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Whither ORP?
On Sep 13, 2010, at 12:33 PM, Itamar Syn-Hershko wrote: With the proper two-way open-source development process (taking and then giving) I think it can become an important part of open-IR technologies, just like what Lucene did to the search engines world. What ORP has to offer is of great interest to HebMorph, an open-source project of mine trying to decide on what is the best way to index and search Hebrew texts. To this end I decided to put some of the development efforts of the HebMorph project into making tools for the ORP. I have announced this before, but unfortunately I had to attend to more pressing tasks before I could complete this (and there was no response from the community anyway...). Just in case you're interested in seeing what I came up with so far: http://github.com/synhershko/Orev. If you can, putting them up as a patch would be useful. That way, we can show some progress. IMHO, the ORP should stand by itself, and relate to Lucene/Solr only as its basis framework for these initial stages. Perhaps also try to attract more people who could find an interest in what it has to offer, so it can really start growing. Itamar. On 12/9/2010 1:29 PM, Grant Ingersoll wrote: On Sep 11, 2010, at 8:51 PM, Robert Muir wrote: i propose we take what we have and import into lucene-java's benchmark contrib. it already has integration with wikipedia and reuters for perf purposes, and the quality package is actually there anyways. later, maybe more people have time and contrib/benchmark evolves naturally... e.g. to modules/benchmark with solr support as a first big step. Yeah, that seems reasonable. I have been thinking lately that it might be useful to pull our DocMaker stuff out separately from benchmark so that people have easy ways of generating content from things like Wikipedia, etc. Still, at the end of the day, I like what ORP _could_ bring to the table and to some extent I think that is lost by folding it into Lucene benchmark. On Sep 11, 2010 7:33 PM, Grant Ingersollgsing...@apache.org wrote: Seems ORP isn't really catching on with people. I know personally I don't have the time I had hoped to have to get it going. At the same time, I really think it could be a good project. We've got some tools put together, but we still haven't done much about the bigger goal of a self contained evaluation. Any thoughts on how we should proceed with ORP? -Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
Re: Whither ORP?
I think the biggest hurdle we have in front of us is curating a data set that we can redistribute. I'm in the process of uploading all the ASF public mail archives as of Sept. 13 to Amazon S3. I also have some tools (thanks to Chris Rhodes) for processing this into Solr XML. I think this would give us a standard corpus to start with and would fairly well mimic some enterprise search/eDiscovery tasks pretty well. At any rate, as with any community, the proof is in people stepping up to help out. I like that so many people suggested we keep going. As for what to do, I think the options are pretty wide open and there is opportunity for people to define the project w/o any previous encumbrances. Some ideas that have been kicked around in the past: 1. Creative-commons data set, judgments, queries 2. Open Street Map (spatial search) 3. Mail archives 4. A crowd sourcing application. Given a set of documents and queries, have people provide judgments. Ideally, this runs in a web container and we could probably even find resources to host it here. Combining that with one of the items above, we would be on our way. App could also solicit queries by providing users open search box and opportunities to browse the data. I know much of this is simplistic, but it is a start. -Grant On Sep 13, 2010, at 9:04 PM, Dan Cardin wrote: Hello, I am new to ORP. I would like to contribute to the project. I do not have a lot of experience in this field of IR, crowd sourcing or AI. If someone could take the lead and set forward path I would be willing to contribute my skill set to ORP. How can I help? I have a lot of experience doing software development and system administration. Cheers, --Dan On Mon, Sep 13, 2010 at 1:36 PM, Omar Alonso oralo...@yahoo.com wrote: I think ORP is a great candidate for crowdsourcing/human computation. In the last year or so there's been quite a bit of research and applications on this. See the page for the SIGIR workshop on using crowdsourcing for IR evaluation: http://www.ischool.utexas.edu/~cse2010/http://www.ischool.utexas.edu/%7Ecse2010/ Omar --- On Mon, 9/13/10, Itamar Syn-Hershko ita...@code972.com wrote: From: Itamar Syn-Hershko ita...@code972.com Subject: Re: Whither ORP? To: openrelevance-...@lucene.apache.org Date: Monday, September 13, 2010, 9:33 AM With the proper two-way open-source development process (taking and then giving) I think it can become an important part of open-IR technologies, just like what Lucene did to the search engines world. What ORP has to offer is of great interest to HebMorph, an open-source project of mine trying to decide on what is the best way to index and search Hebrew texts. To this end I decided to put some of the development efforts of the HebMorph project into making tools for the ORP. I have announced this before, but unfortunately I had to attend to more pressing tasks before I could complete this (and there was no response from the community anyway...). Just in case you're interested in seeing what I came up with so far: http://github.com/synhershko/Orev. IMHO, the ORP should stand by itself, and relate to Lucene/Solr only as its basis framework for these initial stages. Perhaps also try to attract more people who could find an interest in what it has to offer, so it can really start growing. Itamar. On 12/9/2010 1:29 PM, Grant Ingersoll wrote: On Sep 11, 2010, at 8:51 PM, Robert Muir wrote: i propose we take what we have and import into lucene-java's benchmark contrib. it already has integration with wikipedia and reuters for perf purposes, and the quality package is actually there anyways. later, maybe more people have time and contrib/benchmark evolves naturally... e.g. to modules/benchmark with solr support as a first big step. Yeah, that seems reasonable. I have been thinking lately that it might be useful to pull our DocMaker stuff out separately from benchmark so that people have easy ways of generating content from things like Wikipedia, etc. Still, at the end of the day, I like what ORP _could_ bring to the table and to some extent I think that is lost by folding it into Lucene benchmark. On Sep 11, 2010 7:33 PM, Grant Ingersollgsing...@apache.org wrote: Seems ORP isn't really catching on with people. I know personally I don't have the time I had hoped to have to get it going. At the same time, I really think it could be a good project. We've got some tools put together, but we still haven't done much about the bigger goal of a self contained evaluation. Any thoughts on how we should proceed with ORP? -Grant -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
[jira] Commented: (LUCENE-2643) StringHelper#stringDifference is wrong about supplementary chars
[ https://issues.apache.org/jira/browse/LUCENE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909253#action_12909253 ] Robert Muir commented on LUCENE-2643: - My vote would be to drop it if we arent using it, its @lucene.internal. since its unused, its not obvious that its wrong (its correct if you want the first code unit difference) StringHelper#stringDifference is wrong about supplementary chars - Key: LUCENE-2643 URL: https://issues.apache.org/jira/browse/LUCENE-2643 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.0.1, 3.0.2 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Trivial Fix For: 3.0.3, 3.1, 4.0 Attachments: LUCENE-2643.patch StringHelper#stringDifference does not take supplementary characters into account. Since this is not used internally at all we should think about removing it but I guess since it is not too complex we should just or fix it for bwcompat reasons. For released versions we should really fix it since folks might use it though. For trunk we could just drop it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2643) StringHelper#stringDifference is wrong about supplementary chars
[ https://issues.apache.org/jira/browse/LUCENE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2643: Attachment: LUCENE-2643.patch here is a patch StringHelper#stringDifference is wrong about supplementary chars - Key: LUCENE-2643 URL: https://issues.apache.org/jira/browse/LUCENE-2643 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.0.1, 3.0.2 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Trivial Fix For: 3.0.3, 3.1, 4.0 Attachments: LUCENE-2643.patch StringHelper#stringDifference does not take supplementary characters into account. Since this is not used internally at all we should think about removing it but I guess since it is not too complex we should just or fix it for bwcompat reasons. For released versions we should really fix it since folks might use it though. For trunk we could just drop it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2643) StringHelper#stringDifference is wrong about supplementary chars
[ https://issues.apache.org/jira/browse/LUCENE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909256#action_12909256 ] Simon Willnauer commented on LUCENE-2643: - bq. since its unused, its not obvious that its wrong (its correct if you want the first code unit difference) yeah - my interpretation would be its wrong since you use Character.charAt(int) with the index of the first code unit. anyway - we should drop for trunk but I am not sure if we should for 3.x. I mean this is not that much of a deal anyway. StringHelper#stringDifference is wrong about supplementary chars - Key: LUCENE-2643 URL: https://issues.apache.org/jira/browse/LUCENE-2643 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.0.1, 3.0.2 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Trivial Fix For: 3.0.3, 3.1, 4.0 Attachments: LUCENE-2643.patch StringHelper#stringDifference does not take supplementary characters into account. Since this is not used internally at all we should think about removing it but I guess since it is not too complex we should just or fix it for bwcompat reasons. For released versions we should really fix it since folks might use it though. For trunk we could just drop it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2643) StringHelper#stringDifference is wrong about supplementary chars
[ https://issues.apache.org/jira/browse/LUCENE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909260#action_12909260 ] Robert Muir commented on LUCENE-2643: - drop in trunk and mark deprecated in 3.x? regardless of whether its right or wrong, if we arent using it, i think its good to clean house. StringHelper#stringDifference is wrong about supplementary chars - Key: LUCENE-2643 URL: https://issues.apache.org/jira/browse/LUCENE-2643 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.0, 3.0.1, 3.0.2 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Trivial Fix For: 3.0.3, 3.1, 4.0 Attachments: LUCENE-2643.patch StringHelper#stringDifference does not take supplementary characters into account. Since this is not used internally at all we should think about removing it but I guess since it is not too complex we should just or fix it for bwcompat reasons. For released versions we should really fix it since folks might use it though. For trunk we could just drop it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Whither ORP?
Hello, This is a great start! I am interested in helping with the development of a crowd sourcing application. The next step would be creating a set of requirements for the web app. Would the ORP wiki be a good place to store the requirements? --Dan On Tue, Sep 14, 2010 at 9:51 AM, Grant Ingersoll gsing...@apache.orgwrote: I think the biggest hurdle we have in front of us is curating a data set that we can redistribute. I'm in the process of uploading all the ASF public mail archives as of Sept. 13 to Amazon S3. I also have some tools (thanks to Chris Rhodes) for processing this into Solr XML. I think this would give us a standard corpus to start with and would fairly well mimic some enterprise search/eDiscovery tasks pretty well. At any rate, as with any community, the proof is in people stepping up to help out. I like that so many people suggested we keep going. As for what to do, I think the options are pretty wide open and there is opportunity for people to define the project w/o any previous encumbrances. Some ideas that have been kicked around in the past: 1. Creative-commons data set, judgments, queries 2. Open Street Map (spatial search) 3. Mail archives 4. A crowd sourcing application. Given a set of documents and queries, have people provide judgments. Ideally, this runs in a web container and we could probably even find resources to host it here. Combining that with one of the items above, we would be on our way. App could also solicit queries by providing users open search box and opportunities to browse the data. I know much of this is simplistic, but it is a start. -Grant On Sep 13, 2010, at 9:04 PM, Dan Cardin wrote: Hello, I am new to ORP. I would like to contribute to the project. I do not have a lot of experience in this field of IR, crowd sourcing or AI. If someone could take the lead and set forward path I would be willing to contribute my skill set to ORP. How can I help? I have a lot of experience doing software development and system administration. Cheers, --Dan On Mon, Sep 13, 2010 at 1:36 PM, Omar Alonso oralo...@yahoo.com wrote: I think ORP is a great candidate for crowdsourcing/human computation. In the last year or so there's been quite a bit of research and applications on this. See the page for the SIGIR workshop on using crowdsourcing for IR evaluation: http://www.ischool.utexas.edu/~cse2010/http://www.ischool.utexas.edu/%7Ecse2010/ http://www.ischool.utexas.edu/%7Ecse2010/ Omar --- On Mon, 9/13/10, Itamar Syn-Hershko ita...@code972.com wrote: From: Itamar Syn-Hershko ita...@code972.com Subject: Re: Whither ORP? To: openrelevance-...@lucene.apache.org Date: Monday, September 13, 2010, 9:33 AM With the proper two-way open-source development process (taking and then giving) I think it can become an important part of open-IR technologies, just like what Lucene did to the search engines world. What ORP has to offer is of great interest to HebMorph, an open-source project of mine trying to decide on what is the best way to index and search Hebrew texts. To this end I decided to put some of the development efforts of the HebMorph project into making tools for the ORP. I have announced this before, but unfortunately I had to attend to more pressing tasks before I could complete this (and there was no response from the community anyway...). Just in case you're interested in seeing what I came up with so far: http://github.com/synhershko/Orev. IMHO, the ORP should stand by itself, and relate to Lucene/Solr only as its basis framework for these initial stages. Perhaps also try to attract more people who could find an interest in what it has to offer, so it can really start growing. Itamar. On 12/9/2010 1:29 PM, Grant Ingersoll wrote: On Sep 11, 2010, at 8:51 PM, Robert Muir wrote: i propose we take what we have and import into lucene-java's benchmark contrib. it already has integration with wikipedia and reuters for perf purposes, and the quality package is actually there anyways. later, maybe more people have time and contrib/benchmark evolves naturally... e.g. to modules/benchmark with solr support as a first big step. Yeah, that seems reasonable. I have been thinking lately that it might be useful to pull our DocMaker stuff out separately from benchmark so that people have easy ways of generating content from things like Wikipedia, etc. Still, at the end of the day, I like what ORP _could_ bring to the table and to some extent I think that is lost by folding it into Lucene benchmark. On Sep 11, 2010 7:33 PM, Grant Ingersollgsing...@apache.org wrote: Seems ORP isn't really catching on with people. I know personally I don't have the time I had hoped to have to get it going. At the same time, I really think it could be a good project. We've got some tools put
Re: Whither ORP?
On Tue, Sep 14, 2010 at 10:22 AM, Dan Cardin dcardin2...@gmail.com wrote: Hello, This is a great start! I am interested in helping with the development of a crowd sourcing application. The next step would be creating a set of requirements for the web app. Would the ORP wiki be a good place to store the requirements? +1, don't hold back! -- Robert Muir rcm...@gmail.com
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909272#action_12909272 ] Yonik Seeley commented on LUCENE-2504: -- Looks like we're not using the correct comparators everywhere. I was trying a slightly different way to implement sort-missing-last, and my first comparator only implements setNextReader(), but I'm now getting many UnsupportedOperationExceptions (i.e. the search process is using older comparators after calling setNextReader()) One culprit is OneComparatorNonScoringCollector, and another is OneComparatorFieldValueHitQueue I think. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Whither ORP?
On Tue, Sep 14, 2010 at 4:30 PM, Robert Muir rcm...@gmail.com wrote: On Tue, Sep 14, 2010 at 10:22 AM, Dan Cardin dcardin2...@gmail.com wrote: Hello, This is a great start! I am interested in helping with the development of a crowd sourcing application. The next step would be creating a set of requirements for the web app. Would the ORP wiki be a good place to store the requirements? +1, don't hold back! +1 - we need some action here! go for it! -- Robert Muir rcm...@gmail.com
[jira] Updated: (LUCENE-2630) make the build more friendly to apache harmony
[ https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2630: Attachment: LUCENE-2630.patch The harmony developers applied the UTF-8 fix (HARMONY-6640), so we don't need to hack MockTokenizer anymore. i've updated patch, 'ant test-core -Dbuild.compiler=extJavac' almost passes. i'll iterate with some more test improvements now that we are going somewhere. make the build more friendly to apache harmony -- Key: LUCENE-2630 URL: https://issues.apache.org/jira/browse/LUCENE-2630 Project: Lucene - Java Issue Type: Task Components: Build, Tests Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-2630.patch, LUCENE-2630.patch as part of improved testing, i thought it would be a good idea to make the build (ant test) more friendly to working under apache harmony. i'm not suggesting we de-optimize code for sun jvms or anything crazy like that, only use it as a tool. for example: * bugs in tests/code: for example i found a test that expected ArrayIOOBE when really the javadoc contract for the method is just IOOBE... it just happens to pass always on sun jvm because thats the implementation it always throws. * better reproduction of bugs: for example [2 months out of the year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug] it seems TestQueryParser fails with thai locale in a difficult-to-reproduce way. but i *always* get similar failures like this with harmony for this test class. * better stability and portability: we should try (if reasonable) to avoid depending upon internal details. the same kinds of things that fail in harmony might suddenly fail in a future sun jdk. because its such a different impl, it brings out a lot of interesting stuff. at the moment there are currently a lot of failures, I think a lot might be caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: exceptions from solr/contrib/dataimporthandler and solr/contrib/extraction
On Sep 13, 2010, at 1:59 PM, Lance Norskog wrote: What I want you to do is, I want you to find the guys who are putting all the bugs in the code, and I want you to FIRE THEM! He who is without bugs in their code, may be the first to fire. -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2630) make the build more friendly to apache harmony
[ https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2630: Attachment: LUCENE-2630_charutils.patch make the build more friendly to apache harmony -- Key: LUCENE-2630 URL: https://issues.apache.org/jira/browse/LUCENE-2630 Project: Lucene - Java Issue Type: Task Components: Build, Tests Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630_charutils.patch as part of improved testing, i thought it would be a good idea to make the build (ant test) more friendly to working under apache harmony. i'm not suggesting we de-optimize code for sun jvms or anything crazy like that, only use it as a tool. for example: * bugs in tests/code: for example i found a test that expected ArrayIOOBE when really the javadoc contract for the method is just IOOBE... it just happens to pass always on sun jvm because thats the implementation it always throws. * better reproduction of bugs: for example [2 months out of the year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug] it seems TestQueryParser fails with thai locale in a difficult-to-reproduce way. but i *always* get similar failures like this with harmony for this test class. * better stability and portability: we should try (if reasonable) to avoid depending upon internal details. the same kinds of things that fail in harmony might suddenly fail in a future sun jdk. because its such a different impl, it brings out a lot of interesting stuff. at the moment there are currently a lot of failures, I think a lot might be caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909312#action_12909312 ] Michael McCandless commented on LUCENE-2504: bq. I'm now getting many UnsupportedOperationExceptions (i.e. the search process is using older comparators after calling setNextReader()) That's no good! bq. One culprit is OneComparatorNonScoringCollector, and another is OneComparatorFieldValueHitQueue I think. Hmm I don't see the problem -- eg OneComparatorNonScoringCollector saves the returned comparator from comparator.setNextReader. Can you post the full exc? sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-2504: - Attachment: LUCENE-2504.patch Attaching a draft patch that seems to fix the issue (the ones I can find at least). bq. Hmm I don't see the problem - eg OneComparatorNonScoringCollector saves the returned comparator from comparator.setNextReader. Yes, but FieldValueHitQueue has it's own list of comparators that never get updated. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909337#action_12909337 ] Michael McCandless commented on LUCENE-2504: {quote} I think we all owe it to ourselves to stop equating java with Oracle, if Java stays with Oracle its pretty obvious the language (is) will die anyway. {quote} Yeah I agree. The open question is whether this hotspot fickleness is particular to Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included). It's really a hard, complex problem (JIT compilation from bytecode based on runtime data), so it wouldn't surprise me if it's the latter, to varying degrees. bq. .NET is not a choice but generating C/C++ code is? As far as I know it's much easier to invoke C/C++ from java, than .NET from java. C/C++ is also more portable than .NET, I think? (There is Mono -- how mature is it by now?). {quote} I don't think we should jump the gun and make real design/architectural choices based on Oracle bugs. {quote} I expect source code spec will also buy sizable perf gains irrespective of hotspot fickleness, and in non-Oracle java impls. Generating a dedicated class, with one method doing all searching and collecting, removes all kinds of barriers to the JIT compiler. It makes its job far easier. bq. I agree with robert that we should stop comparing against sun jvms all the time and turn everything upside-down specializing code here and there or go one step further and generate C++ code. Dude who is gonna maintain the compatibility to Java-Only environments? If we manage to pursue specialized code gen, it'll be a lng time coming! My point about C/C++ is that if we do somehow manage to get a working code gen framework online (for Java), the added cost to make it also target C/C++ will be relatively small. Ie, it's nearly for free. If we were do to this, that would not mean we'd abandon java, of course -- the framework would fully support pure java as well. bq. I think that code specializations of very hot part of lucene are ok and we should follow that way like we did at some places but it already make things very complicated to follow. You mean manual specialization right (like this issue)? Yes, I think we will have to keep manually specializing, going forward, until we can have code generator that does it more cleanly... bq. Would it make way more sense to push OSS JVMs than spending lots of time on investigating on .NET as an alternative or C/C++ code generator? I think we should do both. bq. Before I would go the C++ path I'd rather use Java to host a C core like lucy which brings you as close as it gets to the machine. I think this (a Java wrapper for Lucy) is a great idea -- we should explore that, too. bq. interesting papers - seems we are touching the limits of Java though. Well that's the big question -- limits of Java or limit's of Sun/Oracle's impl. It looks like harmony has a ways to go on absolute performance: I just ran a very quick benchmark (TermQuery search on 10 M multi-segment wiki index w/ a 50% random filter) and Oracle java 1.6.0_21 gets 15.6 QPS while Harmony 1.5.0-r946978 gets 9.5 QPS (Harmony 1.6.0-r946981 also gets 9.5 QPS). I just ran java -server -Xms2g -Xmx2g; it's possible by tuning Harmony (it has many awesome looking command-line args!) it'd get faster... sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Reopened: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-2504: bq. Yes, but FieldValueHitQueue has it's own list of comparators that never get updated. Ugh, yes. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations
[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2575: - Attachment: LUCENE-2575.patch Term frequency is recorded and returned. There are Terms, TermsEnum, DocsEnum implementations. Needs the term vectors, doc stores exposed via the RAM reader, concurrency unit tests, and a payload unit test. Still quite rough. Concurrent byte and int block implementations - Key: LUCENE-2575 URL: https://issues.apache.org/jira/browse/LUCENE-2575 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Fix For: Realtime Branch Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch The current *BlockPool implementations aren't quite concurrent. We really need something that has a locking flush method, where flush is called at the end of adding a document. Once flushed, the newly written data would be available to all other reading threads (ie, postings etc). I'm not sure I understand the slices concept, it seems like it'd be easier to implement a seekable random access file like API. One'd seek to a given position, then read or write from there. The underlying management of byte arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Whither ORP?
On 14/9/2010 4:22 PM, Dan Cardin wrote: Hello, This is a great start! I am interested in helping with the development of a crowd sourcing application. The next step would be creating a set of requirements for the web app. Would the ORP wiki be a good place to store the requirements? --Dan Uhm... this is actually what I just said I'm in the middle of doing. But perhaps doing some spec'ing through the Wiki would end in a better product, so why not. Please see http://search-lucene.com/m/pLgxg1HCef11subj=OpenRelevance+Viewer+Orev+ http://search-lucene.com/m/pLgxg1HCef11subj=OpenRelevance+Viewer+Orev+ to get an idea of what I did there. Let's branch the discussion from there to get this going in the right direction... As I wrote in the other message, this app can be accessed through http://github.com/synhershko/Orev (.NET / C# / NHibernate), and there's still some to do there. Itamar.
Re: Whither ORP?
On 14/9/2010 3:44 PM, Grant Ingersoll wrote: If you can, putting them up as a patch would be useful. That way, we can show some progress. I will, but first it needs to be workable. It is 80% done, but still not that usable. I expect to be able to work on it again in a month or so. Or someone else could resume from where I stopped (in .NET, or port it to Java). I'm can share what is missing if anyone is interested. Itamar.
[jira] Commented: (LUCENE-2630) make the build more friendly to apache harmony
[ https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909347#action_12909347 ] Simon Willnauer commented on LUCENE-2630: - bq. Here's the patch for TestCharacterUtils. looks good to me! go commit! make the build more friendly to apache harmony -- Key: LUCENE-2630 URL: https://issues.apache.org/jira/browse/LUCENE-2630 Project: Lucene - Java Issue Type: Task Components: Build, Tests Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630_charutils.patch as part of improved testing, i thought it would be a good idea to make the build (ant test) more friendly to working under apache harmony. i'm not suggesting we de-optimize code for sun jvms or anything crazy like that, only use it as a tool. for example: * bugs in tests/code: for example i found a test that expected ArrayIOOBE when really the javadoc contract for the method is just IOOBE... it just happens to pass always on sun jvm because thats the implementation it always throws. * better reproduction of bugs: for example [2 months out of the year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug] it seems TestQueryParser fails with thai locale in a difficult-to-reproduce way. but i *always* get similar failures like this with harmony for this test class. * better stability and portability: we should try (if reasonable) to avoid depending upon internal details. the same kinds of things that fail in harmony might suddenly fail in a future sun jdk. because its such a different impl, it brings out a lot of interesting stuff. at the moment there are currently a lot of failures, I think a lot might be caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Whither ORP?
Hello, I will begin documenting some basic requirements for a crowd sourcing web app. I will use some of the work done by Itamar as a basis for the requirements. --Dan On Tue, Sep 14, 2010 at 1:18 PM, Itamar Syn-Hershko ita...@code972.comwrote: On 14/9/2010 3:44 PM, Grant Ingersoll wrote: If you can, putting them up as a patch would be useful. That way, we can show some progress. I will, but first it needs to be workable. It is 80% done, but still not that usable. I expect to be able to work on it again in a month or so. Or someone else could resume from where I stopped (in .NET, or port it to Java). I'm can share what is missing if anyone is interested. Itamar.
[jira] Updated: (LUCENE-2630) make the build more friendly to apache harmony
[ https://issues.apache.org/jira/browse/LUCENE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2630: Attachment: LUCENE-2630_intl.patch here's a patch for the internationalization differences, since harmony uses ICU. * the collator gives different order for Locale.US, though its wierd we test the order of non-US characters under US Locale (its not defined and inherited from root locale) I conditionalized this test as such: {code} // the sort order of Ø versus U depends on the version of the rules being used // for the inherited root locale: Ø's order isnt specified in Locale.US since // its not used in english. private boolean oStrokeFirst = Collator.getInstance(new Locale()).compare(Ø, U) 0; {code} * the thai dictionary-based break iterator gives different results: I used text that both impls segment the same way. make the build more friendly to apache harmony -- Key: LUCENE-2630 URL: https://issues.apache.org/jira/browse/LUCENE-2630 Project: Lucene - Java Issue Type: Task Components: Build, Tests Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630.patch, LUCENE-2630_charutils.patch, LUCENE-2630_intl.patch as part of improved testing, i thought it would be a good idea to make the build (ant test) more friendly to working under apache harmony. i'm not suggesting we de-optimize code for sun jvms or anything crazy like that, only use it as a tool. for example: * bugs in tests/code: for example i found a test that expected ArrayIOOBE when really the javadoc contract for the method is just IOOBE... it just happens to pass always on sun jvm because thats the implementation it always throws. * better reproduction of bugs: for example [2 months out of the year|http://en.wikipedia.org/wiki/Unusual_software_bug#Phase_of_the_Moon_bug] it seems TestQueryParser fails with thai locale in a difficult-to-reproduce way. but i *always* get similar failures like this with harmony for this test class. * better stability and portability: we should try (if reasonable) to avoid depending upon internal details. the same kinds of things that fail in harmony might suddenly fail in a future sun jdk. because its such a different impl, it brings out a lot of interesting stuff. at the moment there are currently a lot of failures, I think a lot might be caused by this: http://permalink.gmane.org/gmane.comp.java.harmony.devel/39484 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909407#action_12909407 ] Yonik Seeley commented on LUCENE-2504: -- bq. The open question is whether this hotspot fickleness is particular to Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included). I tried IBM's latest Java6 (SR8 FP1, 20100624) It seems to have some of the same pitfalls as Oracle's JVM, just different. The first run does not differ from the second run in the same JVM as it does with Oracle, but the first run itself has much more variation. The worst case is worse, and just like the Oracle JVM, it gets stuck in it's worst case. Each run (of the complete set of fields) in a separate JVM since two runs in the same JVM didn't really differ as they did in the oracle JVM. branch_3x: |unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run |10|129|128|130|109|98|128|135 |1|128|123|127|127|98|128|135 |1000|129|130|130|128|98|130|136 |100|128|133|133|130|100|132|139 |10|150|153|153|154|122|153|159 trunk: |unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run |10|217|81|383|99|79|78|215 |1|254|73|346|101|106|108|267 |1000|253|74|347|99|107|108|258 |100|253|107|394|98|107|102|255 |10|251|107|388|99|106|98|257 The second way of testing is to completely mix fields (no serial correlation between what field is sorted on). This is the test that is very predictable with the Oracle JVM, but I still see wide variability with the IBM JVM. Here is the list of different runs for the Oracle JVM (ms): branch_3x |128|129|123|120|128|100|95|74|130|91|120 trunk |106|89|168|116|155|119|108|118|112|169|165 To my eye, it looks like we have more variability in trunk, due to increased use of abstractions? sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909456#action_12909456 ] Yonik Seeley commented on LUCENE-2504: -- OK, I've committed the fix to always use the latest generation field comparator. Not sure if this is the best way to handle - but at least it's correct now and we can improve more later. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2120) Facet Field Value truncation
Facet Field Value truncation Key: SOLR-2120 URL: https://issues.apache.org/jira/browse/SOLR-2120 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.4.1 Reporter: Niall O'Connor There is a limit on the length of indexed string values of 256 characters, this results in undesirable behavior for facet field values for example: lst name=facet_fields lst name=pub_articletitle int name=12302/int int name=hiv1403/int int name=type1382/int /lst lst name=tissue-antology int name=Lymph node,Organ component,Cardinal organ part,Anatomical structure,Material anatomical entity,Physical anatomical entity,Anatomical entity419/int int name=Left frontal lobe,Frontal lobe,Lobe of cerebral hemisphere,Segment of cerebral hemisphere,Segment of telencephalon,Segment of forebrain,Segment of brain,Segment of neuraxis,Organ segment,Organ region,Cardinal organ part,Anatomical structure,*Material anatom236/int int name=ical entity,Physical anatomical entity,Anatomical entity236/int* /lst The last facet in the list is being truncated and spills into a new facet. This also eats up a facet since i usually only return the top 3. Is 256 characters a hard limit in the indexing strategy? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2644) LowerCaseTokenizer Does Not Behave As One Might Expect (or Desire)--Given Its Name
LowerCaseTokenizer Does Not Behave As One Might Expect (or Desire)--Given Its Name -- Key: LUCENE-2644 URL: https://issues.apache.org/jira/browse/LUCENE-2644 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0.2 Reporter: Scott Gonyea Fix For: 3.0.3, 3.1, Realtime Branch, 4.0 While I understand some of the reasons for its design, the original LowerCaseTokenizer should have been named LowerCaseLetterTokenizer. I feel that LowerCaseTokenizer makes too many assumptions about what too tokenize, and I have therefore patched it. The *default* behavior will remain as it always has--to avoid breaking any implementations for which it's being used. I have changed LowerCaseTokenizer to extend CharTokenizer (rather than LetterTokenizer). LetterTokenizer's functionality was merged into the default behavior of LowerCaseTokenizer. Getter/Setter methods have been added to the LowerCaseTokenizer Class, allowing you to turn on / off tokenizing by white space, numbers, and special (Non-Alpha/Numeric) characters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2644) LowerCaseTokenizer Does Not Behave As One Might Expect (or Desire)--Given Its Name
[ https://issues.apache.org/jira/browse/LUCENE-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Gonyea updated LUCENE-2644: - Attachment: LowerCaseTokenizer.patch This patch will retain original functionality, while permitting the user to modify the assumptions on which tokens are built. LowerCaseTokenizer Does Not Behave As One Might Expect (or Desire)--Given Its Name -- Key: LUCENE-2644 URL: https://issues.apache.org/jira/browse/LUCENE-2644 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0.2 Reporter: Scott Gonyea Fix For: 3.0.3, 3.1, Realtime Branch, 4.0 Attachments: LowerCaseTokenizer.patch While I understand some of the reasons for its design, the original LowerCaseTokenizer should have been named LowerCaseLetterTokenizer. I feel that LowerCaseTokenizer makes too many assumptions about what too tokenize, and I have therefore patched it. The *default* behavior will remain as it always has--to avoid breaking any implementations for which it's being used. I have changed LowerCaseTokenizer to extend CharTokenizer (rather than LetterTokenizer). LetterTokenizer's functionality was merged into the default behavior of LowerCaseTokenizer. Getter/Setter methods have been added to the LowerCaseTokenizer Class, allowing you to turn on / off tokenizing by white space, numbers, and special (Non-Alpha/Numeric) characters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated LUCENE-2504: - Attachment: LUCENE-2504_SortMissingLast.patch This was a simple attempt to try and simplify the comparators. Static classes are used instead of inner classes. Unfortunately, it didn't help the JVMs from getting stuck in badly optimized code (it was a long shot for that), but it does result in a consistent 4% speedup. It looks as simple as the previous version to my eye, so I'll commit if there are no objections. sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip, LUCENE-2504_SortMissingLast.patch sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2575) Concurrent byte and int block implementations
[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2575: - Attachment: LUCENE-2575.patch Added a unit test for payloads, term vectors, and doc stores. The reader flushes term vectors and doc stores on demand, once per reader. Also, little things are getting cleaned up in the realtime branch. Concurrent byte and int block implementations - Key: LUCENE-2575 URL: https://issues.apache.org/jira/browse/LUCENE-2575 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Fix For: Realtime Branch Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch The current *BlockPool implementations aren't quite concurrent. We really need something that has a locking flush method, where flush is called at the end of adding a document. Once flushed, the newly written data would be available to all other reading threads (ie, postings etc). I'm not sure I understand the slices concept, it seems like it'd be easier to implement a seekable random access file like API. One'd seek to a given position, then read or write from there. The underlying management of byte arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-1194) Query Analyzer not Invoking for Custom FiledType - When we use Custom QParser Plugin
[ https://issues.apache.org/jira/browse/SOLR-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-1194. Resolution: Invalid This sounds like a bug in your custom QParser -- the QParser is what calls the analyzer and constructs the query. w/o any information as to how FPersonQParserPlugin is implemented, there doesn't seem to be a bug here. If your issue is that you have questions about how to implement .FPersonQParserPlugin properly so thta it uses the field's analyzer, please post that as a question to the solr-user mailing list Query Analyzer not Invoking for Custom FiledType - When we use Custom QParser Plugin Key: SOLR-1194 URL: https://issues.apache.org/jira/browse/SOLR-1194 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Environment: Windows, Java 1.6. Solr 1.3 Reporter: Nagarajan.shanmugam Original Estimate: 2h Remaining Estimate: 2h Hi I Created Custom Solr Field kwd_names in schema.xml fieldType name=kwd_names class=solr.TextField positionIncrementGap=100 analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=Metaphone inject=true/ /analyzer analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=Metaphone inject=true/ /analyzer /fieldType I configured requestHandler in solrConfig.xml with Custom QparserPlugin requestHandler name=fperson class=solr.SearchHandler !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str str name=defTypefpersonQueryParser/str /lst /requestHandler queryParser name=fpersonQueryParser class=com.thinkronize.edudym.search.analysis.FPersonQParserPlugin / SolrQuery q = new SolrQuery(); q.setParam(q, George); q.setParam(gender, M); q.setQueryType(FPersonSearcher.QUERY_TYPE); server.query(q); When I fire Query it wont invoke the QueryAnlayzer it Doesnt give any result. But if i remove q.setQueryType its invoking the query analyzer and its giving results That mean QueryAnalyzer for that field not invoked when i use CustomQParser Plugin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2119) IndexSchema should log warning if analyzer is declared with charfilter/tokenizer/tokenfiler out of order
[ https://issues.apache.org/jira/browse/SOLR-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909511#action_12909511 ] Robert Muir commented on SOLR-2119: --- {quote} There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter - while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their analyzer in a way that doesn't make any sense. {quote} I think we should do both, this is a great idea. {quote} (we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting correct results that work for them, and breaking their instance on upgrade doens't seem like it would be productive) {quote} I would prefer a hard error. I think someone who doesnt understand what tokenizers and filters do, likely isnt looking at their log files either. In my opinion, Solr should be more picky about its configuration. Often times if i havent had enough sleep i will type tokenFilter instead of filter, and it simply ignores it completely, instead of an error. and i can't be the only one that does this, its not obvious that tokenizer = Tokenizer, charFilter = CharFilter, analyzer = Analyzer, but filter = TokenFilter. IndexSchema should log warning if analyzer is declared with charfilter/tokenizer/tokenfiler out of order -- Key: SOLR-2119 URL: https://issues.apache.org/jira/browse/SOLR-2119 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Hoss Man There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter -- while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their analyzer in a way that doesn't make any sense. at the moment, some people are attempting to do things like move the Foo tokenFilter/ before the tokenizer/ to try and get certain behavior ... at a minimum we should log a warning in this case that doing that doesn't have the desired effect (we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting correct results that work for them, and breaking their instance on upgrade doens't seem like it would be productive) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2121) distributed highlighting using q.alt=*:* causes NPE in finishStages
distributed highlighting using q.alt=*:* causes NPE in finishStages --- Key: SOLR-2121 URL: https://issues.apache.org/jira/browse/SOLR-2121 Project: Solr Issue Type: Bug Reporter: Hoss Man As noted on the mailing list by Ron Mayer, using the example configs and example data on trunk, this query works... http://localhost:8983/solr/select?q.alt=*:*hl=ondefType=edismax ...but this query causes and NullPointerException... http://localhost:8983/solr/select?q.alt=*:*hl=ondefType=edismaxshards=localhost:8983/solr Stack Trace... {noformat} java.lang.NullPointerException at org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:158) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:310) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1324) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2121) distributed highlighting using q.alt=*:* causes NPE in finishStages
[ https://issues.apache.org/jira/browse/SOLR-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909514#action_12909514 ] Hoss Man commented on SOLR-2121: Marc Sturlese posted his fix but it's not entirely obvious to me what exactly the necessary change is, or if the root cause isn't somewhere else... {code} public void finishStage(ResponseBuilder rb) { boolean hasHighlighting = true ; if (rb.doHighlights rb.stage == ResponseBuilder.STAGE_GET_FIELDS) { Map.EntryString, Object[] arr = new NamedList.NamedListEntry[rb.resultIds.size()]; // TODO: make a generic routine to do automatic merging of id keyed data for (ShardRequest sreq : rb.finished) { if ((sreq.purpose ShardRequest.PURPOSE_GET_HIGHLIGHTS) == 0) continue; for (ShardResponse srsp : sreq.responses) { NamedList hl = (NamedList)srsp.getSolrResponse().getResponse().get(highlighting); //patch bug if(hl != null) { for (int i=0; ihl.size(); i++) { String id = hl.getName(i); ShardDoc sdoc = rb.resultIds.get(id); int idx = sdoc.positionInResponse; arr[idx] = new NamedList.NamedListEntry(id, hl.getVal(i)); } } else { hasHighlighting = false; } } } // remove nulls in case not all docs were able to be retrieved //patch bug if(hasHighlighting) { rb.rsp.add(highlighting, removeNulls(new SimpleOrderedMap(arr))); } } } {code} distributed highlighting using q.alt=*:* causes NPE in finishStages --- Key: SOLR-2121 URL: https://issues.apache.org/jira/browse/SOLR-2121 Project: Solr Issue Type: Bug Reporter: Hoss Man As noted on the mailing list by Ron Mayer, using the example configs and example data on trunk, this query works... http://localhost:8983/solr/select?q.alt=*:*hl=ondefType=edismax ...but this query causes and NullPointerException... http://localhost:8983/solr/select?q.alt=*:*hl=ondefType=edismaxshards=localhost:8983/solr Stack Trace... {noformat} java.lang.NullPointerException at org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:158) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:310) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1324) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Current trunk example woes...
If I check out the current trunk, and from solr do an ant clean example all is well, even up to starting Solr. But trying to hit anything on the site gives a response in the browser starting with: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:Error loading class 'solr.SpatialTileField' Commenting the relevant fieldType out of schema.xml fixes this. Should I open a Jira or does someone want to jump on it? Erick
Obsolete instructions for Velocity ResponseWriter on the Wiki
For trunk, the instructions here: http://wiki.apache.org/solr/VelocityResponseWriter about starting up VRW/Solaritas are obsolete I think. It looks like all this has been folded into core. I'll go up and add some notes for trunk/1.5 unless someone objects. Erick
Build failed in Hudson: Lucene-3.x #115
See https://hudson.apache.org/hudson/job/Lucene-3.x/115/changes Changes: [rmuir] LUCENE-2630: look for the correct exception according to javadoc contract [gsingers] SOLR-1568: move DistanceUtils up a package [gsingers] SOLR-1568: backport to 3.x [rmuir] LUCENE-2630: allow lucene to be built with non-sun jvms [rmuir] missing merge props for r996720 [rmuir] quiet this test [rmuir] LUCENE-2642: merge Uwe's test improvements [rmuir] LUCENE-2642: merge LuceneTestCase and LuceneTestCaseJ4 [rmuir] add exception ignore for extraction test [rmuir] SOLR-2118: fix setTermIndexDivisor param to have its correct name -- [...truncated 18329 lines...] [junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 2.863 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.018 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.005 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestWildcard [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.036 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 8.251 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestDocValues [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.005 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.249 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestOrdValues [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.102 sec [junit] [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadNearQuery [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.65 sec [junit] [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadTermQuery [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.033 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestBasics [junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 14.143 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestFieldMaskingSpanQuery [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.663 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestNearSpansOrdered [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 0.093 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestPayloadSpans [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 1.56 sec [junit] [junit] - Standard Output --- [junit] [junit] Spans Dump -- [junit] payloads for span:2 [junit] doc:0 s:3 e:6 one:Entity:3 [junit] doc:0 s:3 e:6 three:Noise:5 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:0 s:0 e:3 rr:Noise:1 [junit] doc:0 s:0 e:3 yy:Noise:2 [junit] doc:0 s:0 e:3 xx:Entity:0 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:1 s:0 e:4 yy:Noise:1 [junit] doc:1 s:0 e:4 rr:Noise:3 [junit] doc:1 s:0 e:4 xx:Entity:0 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:0 s:0 e:3 rr:Noise:1 [junit] doc:0 s:0 e:3 yy:Noise:2 [junit] doc:0 s:0 e:3 xx:Entity:0 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:0 s:0 e:3 yy:Noise:2 [junit] doc:0 s:0 e:3 xx:Entity:0 [junit] doc:0 s:0 e:3 rr:Noise:1 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:1 s:0 e:4 rr:Noise:3 [junit] doc:1 s:0 e:4 xx:Entity:0 [junit] doc:1 s:0 e:4 yy:Noise:1 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:2 s:0 e:5 ss:Noise:2 [junit] doc:2 s:0 e:5 qq:Noise:1 [junit] doc:2 s:0 e:5 pp:Noise:3 [junit] [junit] Spans Dump -- [junit] payloads for span:8 [junit] doc:3 s:0 e:11 ten:Noise:9 [junit] doc:3 s:0 e:11 two:Noise:1 [junit] doc:3 s:0 e:11 six:Noise:5 [junit] doc:3 s:0 e:11 eleven:Noise:10 [junit] doc:3 s:0 e:11 five:Noise:4 [junit] doc:3 s:0 e:11 one:Entity:0 [junit] doc:3 s:0 e:11 three:Noise:2 [junit] doc:3 s:0 e:11 nine:Noise:8 [junit] [junit] Spans Dump -- [junit] payloads for span:8 [junit] doc:4 s:0 e:11 nine:Noise:0 [junit] doc:4 s:0 e:11 five:Noise:5 [junit] doc:4 s:0 e:11 eleven:Noise:9 [junit] doc:4 s:0 e:11 two:Noise:2 [junit] doc:4 s:0 e:11 one:Entity:1 [junit] doc:4 s:0 e:11 six:Noise:6 [junit] doc:4 s:0 e:11 ten:Noise:10
/trunk sortMissingLast=true status?
Testing with r997128: I have a field defined as: fieldType name=bytes class=solr.TrieLongField sortMissingLast=true precisionStep=0 omitNorms=true positionIncrementGap=0/ When I call ?sort=bytes desc, everything works as expected, the biggest thigns are first. When I call ?sort=bytes asc, the entries without a bytes field all go first. I am sort of following the changes in LUCENE-2504, that point to odities with sortMissignLast, but yoniks comments in #997095 suggest this should be working. Am i missing something? Thanks Ryan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Current trunk example woes...
On Tue, Sep 14, 2010 at 8:16 PM, Erick Erickson erickerick...@gmail.com wrote: If I check out the current trunk, and from solr do an ant clean example all is well, even up to starting Solr. But trying to hit anything on the site gives a response in the browser starting with: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:Error loading class 'solr.SpatialTileField' Commenting the relevant fieldType out of schema.xml fixes this. Should I open a Jira or does someone want to jump on it? Hmmm, I can't reproduce this. Something like http://localhost:8983/solr/select?q=solr seems to work fine. Did you do an svn up at the trunk level (i.e. get lucene too)? -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: /trunk sortMissingLast=true status?
On Tue, Sep 14, 2010 at 9:40 PM, Ryan McKinley ryan...@gmail.com wrote: Testing with r997128: I have a field defined as: fieldType name=bytes class=solr.TrieLongField sortMissingLast=true precisionStep=0 omitNorms=true positionIncrementGap=0/ SortMissingLast/SortMissingFirst is currently only supported with fields that internally use a StringIndex (now DocTermsIndex) because that's the only FieldCache representation that records what fields are missing (via an ord of 0). There's a note in the example schema.xml: !-- The optional sortMissingLast and sortMissingFirst attributes are currently supported on types that are sorted internally as strings. This includes string,boolean,sint,slong,sfloat,sdouble,pdate That's actually the only reason the sint type fields are still around. If we could distinguish between 0 and missing, we could deprecate/remove the s* fields and always use trie fields. -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2645) False assertion of 0 position delta in StandardPostingsWriterImpl
False assertion of 0 position delta in StandardPostingsWriterImpl -- Key: LUCENE-2645 URL: https://issues.apache.org/jira/browse/LUCENE-2645 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: David Smiley Priority: Minor StandardPostingsWriterImpl line 159 is: {code:java} assert delta 0 || position == 0 || position == -1: position= + position + lastPosition= + lastPosition;// not quite right (if pos=0 is repeated twice we don't catch it) {code} I enable assertions when I run my unit tests and I've found this assertion to fail when delta is 0 which occurs when the same position value is sent in twice in arrow. Once I added RemoveDuplicatesTokenFilter, this problem went away. Should I really be forced to add this filter? I think delta = 0 would be a better assertion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2611) IntelliJ IDEA setup
[ https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909579#action_12909579 ] Steven Rowe commented on LUCENE-2611: - Once Robert's latest patch on SOLR-2002 gets applied -- it moves around some of the Solr module structure -- the IntelliJ setup patches will need to be adjusted. IntelliJ IDEA setup --- Key: LUCENE-2611 URL: https://issues.apache.org/jira/browse/LUCENE-2611 Project: Lucene - Java Issue Type: New Feature Components: Build Affects Versions: 3.1, 4.0 Reporter: Steven Rowe Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611_mkdir.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test_2.patch Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming. The attached patch adds a new top level directory {{dev-tools/}} with sub-dir {{idea/}} containing basic setup files for trunk, as well as a top-level ant target named idea that copies these files into the proper locations. This arrangement avoids the messiness attendant to in-place project configuration files directly checked into source control. The IDEA configuration includes modules for Lucene and Solr, each Lucene and Solr contrib, and each analysis module. A JUnit test run per module is included. Once {{ant idea}} has been run, the only configuration that must be performed manually is configuring the project-level JDK. If this patch is committed, Subversion svn:ignore properties should be added/modified to ignore the destination module files (*.iml) in each module's directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations
[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909580#action_12909580 ] Jason Rutherglen commented on LUCENE-2575: -- For the posting skip list we need to implement seek on the ByteSliceReader. However if we're rewriting a portion of a slice, then I guess we could have a problem... Meaning we'd be storing an absolute position in the skip list, and we could go to look up the value, however that byte(s) could have been altered to not be delta encoded doc ids anymore, but instead is/are the forwarding address to the next slice. Do we need an intelligent mechanism that interacts with the byte slice writer to not point at byte array elements (ie the end of slices) that could later be converted into forwarding addresses? Concurrent byte and int block implementations - Key: LUCENE-2575 URL: https://issues.apache.org/jira/browse/LUCENE-2575 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Fix For: Realtime Branch Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch The current *BlockPool implementations aren't quite concurrent. We really need something that has a locking flush method, where flush is called at the end of adding a document. Once flushed, the newly written data would be available to all other reading threads (ie, postings etc). I'm not sure I understand the slices concept, it seems like it'd be easier to implement a seekable random access file like API. One'd seek to a given position, then read or write from there. The underlying management of byte arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Hudson: Solr-3.x #104
See https://hudson.apache.org/hudson/job/Solr-3.x/104/changes Changes: [rmuir] LUCENE-2630: fix intl test bugs that rely on cldr version [rmuir] LUCENE-2630: look for the correct exception according to javadoc contract [gsingers] SOLR-1568: move DistanceUtils up a package [gsingers] SOLR-1568: backport to 3.x [rmuir] LUCENE-2630: allow lucene to be built with non-sun jvms [rmuir] missing merge props for r996720 -- [...truncated 5476 lines...] clover: common.compile-core: compile-core: compile-test: [javac] Compiling 1 source file to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test init: clover.setup: clover.info: clover: compile-core: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/misc/classes/java [javac] Compiling 11 source files to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/misc/classes/java [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compile: [echo] Building queries... Trying to override old definition of task m2-deploy Trying to override old definition of task invoke-javadoc common.init: build-lucene: Trying to override old definition of task contrib-crawl jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: clover: common.compile-core: compile-core: compile-test: [javac] Compiling 1 source file to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test init: clover.setup: clover.info: clover: compile-core: compile: [echo] Building spatial... Trying to override old definition of task m2-deploy Trying to override old definition of task invoke-javadoc build-queries: common.init: build-lucene: Trying to override old definition of task contrib-crawl jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: clover: common.compile-core: compile-core: compile-test: [javac] Compiling 1 source file to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test init: clover.setup: clover.info: clover: common.compile-core: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/spatial/classes/java [javac] Compiling 29 source files to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/spatial/classes/java [javac] Note: https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/contrib/spatial/src/java/org/apache/lucene/spatial/tier/CartesianPolyFilterBuilder.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compile-core: compile: [echo] Building spellchecker... Trying to override old definition of task m2-deploy Trying to override old definition of task invoke-javadoc common.init: build-lucene: Trying to override old definition of task contrib-crawl jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: clover: common.compile-core: compile-core: compile-test: [javac] Compiling 1 source file to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test init: clover.setup: clover.info: clover: compile-core: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/spellchecker/classes/java [javac] Compiling 12 source files to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/spellchecker/classes/java [javac] Note: https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/contrib/spellchecker/src/java/org/apache/lucene/search/spell/SpellChecker.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compile: [echo] Building xml-query-parser... Trying to override old definition of task m2-deploy Trying to override old definition of task invoke-javadoc build-queries: common.init: build-lucene: Trying to override old definition of task contrib-crawl jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: clover: common.compile-core: compile-core: compile-test: [javac] Compiling 1 source file to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/classes/test init: clover.setup: clover.info: clover: common.compile-core: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/xml-query-parser/classes/java [javac] Compiling 36 source files to https://hudson.apache.org/hudson/job/Solr-3.x/ws/branch_3x/lucene/build/contrib/xml-query-parser/classes/java [javac] Note: