[jira] [Commented] (SOLR-3161) Use of 'qt' should be restricted to searching and should not start with a '/'
[ https://issues.apache.org/jira/browse/SOLR-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233251#comment-13233251 ] David Smiley commented on SOLR-3161: As long as you provide a leading '/' to shards.qt, there is no problem because the sharded request will use that as the path and not use 'qt'. The smarts that make that happen is largely due to the logic in QueryRequest.getPath(). I just played around with this in tests and stepped through the code to prove it out. This does remind me of another attack vector of sorts for what started all this. Even with qt disabled, this still leaves the possibility of /mysearch?q=...shards=...shards.qt=/update Use of 'qt' should be restricted to searching and should not start with a '/' - Key: SOLR-3161 URL: https://issues.apache.org/jira/browse/SOLR-3161 Project: Solr Issue Type: Improvement Components: search, web gui Reporter: David Smiley Assignee: David Smiley Fix For: 3.6, 4.0 Attachments: SOLR-3161-disable-qt-by-default.patch, SOLR-3161-dispatching-request-handler.patch, SOLR-3161-dispatching-request-handler.patch I haven't yet looked at the code involved for suggestions here; I'm speaking based on how I think things should work and not work, based on intuitiveness and security. In general I feel it is best practice to use '/' leading request handler names and not use qt, but I don't hate it enough when used in limited (search-only) circumstances to propose its demise. But if someone proposes its deprecation that then I am +1 for that. Here is my proposal: Solr should error if the parameter qt is supplied with a leading '/'. (trunk only) Solr should only honor qt if the target request handler extends solr.SearchHandler. The new admin UI should only use 'qt' when it has to. For the query screen, it could present a little pop-up menu of handlers to choose from, including /select?qt=mycustom for handlers that aren't named with a leading '/'. This choice should be positioned at the top. And before I forget, me or someone should investigate if there are any similar security problems with the shards.qt parameter. Perhaps shards.qt can abide by the same rules outlined above. Does anyone foresee any problems with this proposal? On a related subject, I think the notion of a default request handler is bad - the default=true thing. Honestly I'm not sure what it does, since I noticed Solr trunk redirects '/solr/' to the new admin UI at '/solr/#/'. Assuming it doesn't do anything useful anymore, I think it would be clearer to use requestHandler name=/select class=solr.SearchHandler instead of what's there now. The delta is to put the leading '/' on this request handler name, and remove the default attribute. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: any general way of getting which attributes token stream has?
(12/03/20 13:47), Robert Muir wrote: I think we should probably change the QueryConverter api from: public abstract CollectionToken convert(String original); to: public abstract TokenStream convert(original) Currently attributes such as ReadingAttribute are lost... If we really want a Collection we could alternatively have CollectionAttributeSource which would also preserve attributes, but it seems silly when QueryConverter could just return a TokenStream. This makes SuggestQueryConverter extremely simple :) In fact SpellingQueryConvert could be simple too: I think its basically really just is a regex-tokenizer with a stopword list (OR/AND) ? Hi Robert, Thanks for the comment. As I'm investigating further the Lucene spell checker for Japanese, I've realized that there is more essential problem in it. I'll open a JIRA ticket for it shortly. In the ticket, I change the api you mentioned if needed. koji -- Query Log Visualizer for Apache Solr http://soleami.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Java Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.6, 4.0 The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: any general way of getting which attributes token stream has?
Hello Koji, Can't it be done via tokenStrem.reflectWith(AttributeReflector) with reflector which puts all attrs properties into Token via reflection or into AttributeSource? WDYT? 2012/3/20 Koji Sekiguchi k...@r.email.ne.jp Is there any general way of getting/looking what attributes a token stream has? I want to use spell checker with a query analyzer, which the analyzer generates ReadingAttribute for each tokens, and I want to use the ReadingAttributes for spell checking. I think I can have my own SpellingQueryConverter extension to override analyze method, but I saw the TODO comment in SpellingQueryConverter: protected void analyze(CollectionToken result, Reader text, int offset) throws IOException { TokenStream stream = analyzer.reusableTokenStream(, text); // TODO: support custom attributes CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class); FlagsAttribute flagsAtt = stream.addAttribute(FlagsAttribute.class); TypeAttribute typeAtt = stream.addAttribute(TypeAttribute.class); PayloadAttribute payloadAtt = stream.addAttribute(PayloadAttribute.class); PositionIncrementAttribute posIncAtt = stream.addAttribute(PositionIncrementAttribute.class); OffsetAttribute offsetAtt = stream.addAttribute(OffsetAttribute.class); : If we can have a general way of getting such information, I think it would be helpful not only for spell checking. (For example, SynonymFilter can add PartOfSpeechAttribute if the original token has.) koji -- Query Log Visualizer for Apache Solr http://soleami.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3888: --- Attachment: LUCENE-3888.patch The patch cannot be compiled now because I changed the return type of the method in Dictionary interface but all implemented classes have not been changed. Please give some comment because I'm new to spell checker. If no problem to go, I'll continue to work. split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Java Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3888.patch The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3683) Add @Noisy annotation for uncontrollably noisy tests
[ https://issues.apache.org/jira/browse/LUCENE-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-3683: Fix Version/s: 4.0 Add @Noisy annotation for uncontrollably noisy tests Key: LUCENE-3683 URL: https://issues.apache.org/jira/browse/LUCENE-3683 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Assignee: Dawid Weiss Fix For: 4.0 Attachments: LUCENE-LUCENE3808-JOB1-142.log {code} /** * Annotation for test classes that are uncontrollably loud, and you * only want output if they actually fail, error, or VERBOSE is enabled. * @deprecated Fix your test to properly use {@link #VERBOSE} ! */ @Documented @Deprecated @Target(ElementType.TYPE) @Retention(RetentionPolicy.RUNTIME) public @interface Noisy {} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233291#comment-13233291 ] Robert Muir commented on LUCENE-3888: - Koji: hmm I think the problem is not in the Dictionary interface (which is actually ok), but instead in the spellcheckers and suggesters themselves? For spellchecking, I think we need to expose more Analysis options in Spellchecker: currently this is actually hardcoded at KeywordAnalyzer (it uses NOT_ANALYZED). Instead I think you should be able to pass Analyzer: we would also have a TokenFilter for Japanese that replaces term text with Reading from ReadingAttribute. In the same way, suggest can analyze too. (LUCENE-3842 is already some work for that, especially with the idea to support Japanese this exact same way). So in short I think we should: # create a TokenFilter (similar to BaseFormFilter) which copies ReadingAttribute into termAtt. # refactor the 'n-gram analysis' in spellchecker to work on actual tokenstreams (this can also likely be implemented as tokenstreams), allowing user to set an Analyzer on Spellchecker to control how it analyzes text. # continue to work on 'analysis for suggest' like LUCENE-3842. Note this use of analyzers in spellcheck/suggest is unrelated to Solr's current use of 'analyzers' which is only for some query manipulation and not very useful. split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Java Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3888.patch The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3868) Thread interruptions shouldn't cause unhandled thread errors (or should they?).
[ https://issues.apache.org/jira/browse/LUCENE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-3868: Fix Version/s: (was: flexscoring branch) 4.0 Thread interruptions shouldn't cause unhandled thread errors (or should they?). --- Key: LUCENE-3868 URL: https://issues.apache.org/jira/browse/LUCENE-3868 Project: Lucene - Java Issue Type: Bug Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 This is a result of pulling uncaught exception catching to a rule above interrupt in internalTearDown(); check how it was before and restore previous behavior? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3206) FST package API refactoring
[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-3206: Affects Version/s: (was: 3.2) Fix Version/s: (was: flexscoring branch) 4.0 FST package API refactoring --- Key: LUCENE-3206 URL: https://issues.apache.org/jira/browse/LUCENE-3206 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: LUCENE-3206.patch The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Using term offsets for hit highlighting
Thanks for all the offers of help! It looks as though most of the hard work has already been done, which is exactly where I like to pick up projects. :-) Maybe the best place to start would be for me to rebase the branch against trunk, and see what still fits? I think there have been some fairly major changes in the internals since July last year. On 19 Mar 2012, at 17:07, Mike Sokolov wrote: I posted a patch with a Collector somewhat similar to what you described, Alan - it's attached to one of the sub-issues https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly complete alpha state, but has seen no production use of course, since it relies on the remainder of the unfinished work in that branch. It works by creating a TokenStream based on match positions returned from the query and passing that to the existing Highlighter. Please feel free to get in touch if you decide to look into that and have questions. -Mike On 03/19/2012 11:51 AM, Simon Willnauer wrote: On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindleru...@thetaphi.de wrote: Have you marked that for GSOC? Would be a good idea! yes I did - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Monday, March 19, 2012 4:43 PM To: dev@lucene.apache.org Subject: Re: Using term offsets for hit highlighting Alan, you made my day! The branch is kind of outdated but I looked at it lately and I can certainly help to get it up to speed. The feature in that branch is quite a big one and its in a very early stage. Still I want to encourage you to take a look and work on it. I promise all my help with the issues! let me know if you have questions! simon On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Cool, thanks Robert. I'll take a look at the JIRA ticket. On 19 Mar 2012, at 14:44, Robert Muir wrote: On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hello, The project I'm currently working on requires the reporting of exact hit positions from some pretty hairy queries, not all of which are covered by the existing highlighter modules. I'm working round this by translating everything into SpanQueries, and using the getSpans() method to locate hits (I've extended the Spans interface to make term offsets available - see https://issues.apache.org/jira/browse/LUCENE-3826). This works for our use-case, but isn't terribly efficient, and obviously isn't applicable to non-Span queries. I've seen a bit of chatter on the list about using term offsets to provide accurate highlighting in Lucene. I'm going to have a couple of weeks free in April, and I thought I might have a go at implementing this. Mainly I'm wondering if there's already been thoughts about how to do it. My current thoughts are to somehow extend the Weight and Scorer interface to make term offsets available; to get highlights for a given set of documents, you'd essentially run the query again, with a filter on just the documents you want highlighted, and have a custom collector that gets the term offsets in place of the scores. Hi Alan, Simon started some initial work on https://issues.apache.org/jira/browse/LUCENE-2878 Some work and prototypes were done in a branch, but it might be lagging behind trunk a bit. Additionally at the time it was first done, I think we didn't yet support offsets in the postings lists. We've since added this and several codecs support it. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:
[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3888: Attachment: LUCENE-3888.patch Here is a simple prototype of what I was suggesting, allows you to specify Analyzer to SpellChecker. This Analyzer converts the 'surface form' into 'analyzed form' at index and query time: at index-time it forms n-grams based on the analyzed form, but stores the surface form for retrieval. At query-time we have a similar process: the docFreq() etc checks are done on the surface form, but the actual spellchecking on the analyzed form. The default Analyzer is null which means do nothing, and the patch has no tests, refactoring, or any of that. split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Java Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3888.patch, LUCENE-3888.patch The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3888: Attachment: LUCENE-3888.patch fix the obvious reset() problem... the real problem is I need to reset() my coffee mug. split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Java Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Priority: Minor Fix For: 3.6, 4.0 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3889) Remove/Uncommit SegmentingTokenizerBase
Remove/Uncommit SegmentingTokenizerBase --- Key: LUCENE-3889 URL: https://issues.apache.org/jira/browse/LUCENE-3889 Project: Lucene - Java Issue Type: Task Affects Versions: 3.6, 4.0 Reporter: Robert Muir I added this class in LUCENE-3305 to support analyzers like Kuromoji, but Kuromoji no longer needs it as of LUCENE-3767. So now nothing uses it. I think we should uncommit before releasing, svn doesn't forget so we can add this back if we want to refactor something like Thai or Smartcn to use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3889) Remove/Uncommit SegmentingTokenizerBase
[ https://issues.apache.org/jira/browse/LUCENE-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3889: Attachment: LUCENE-3889.patch Remove/Uncommit SegmentingTokenizerBase --- Key: LUCENE-3889 URL: https://issues.apache.org/jira/browse/LUCENE-3889 Project: Lucene - Java Issue Type: Task Affects Versions: 3.6, 4.0 Reporter: Robert Muir Attachments: LUCENE-3889.patch I added this class in LUCENE-3305 to support analyzers like Kuromoji, but Kuromoji no longer needs it as of LUCENE-3767. So now nothing uses it. I think we should uncommit before releasing, svn doesn't forget so we can add this back if we want to refactor something like Thai or Smartcn to use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2020) HttpComponentsSolrServer
[ https://issues.apache.org/jira/browse/SOLR-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated SOLR-2020: - Attachment: SOLR-2020.patch Improved patch with cleanups + additional tests. HttpComponentsSolrServer Key: SOLR-2020 URL: https://issues.apache.org/jira/browse/SOLR-2020 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4.1 Environment: Any Reporter: Chantal Ackermann Priority: Minor Fix For: 4.0 Attachments: HttpComponentsSolrServer.java, HttpComponentsSolrServerTest.java, SOLR-2020-HttpSolrServer.patch, SOLR-2020.patch, SOLR-2020.patch Implementation of SolrServer that uses the Apache Http Components framework. Http Components (http://hc.apache.org/) is the successor of Commons HttpClient and thus HttpComponentsSolrServer would be a successor of CommonsHttpSolrServer, in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-445: Fix Version/s: (was: 3.6) Update Handlers abort with bad documents Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.3 Reporter: Will Johnson Assignee: Erick Erickson Fix For: 4.0 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-2242: Assignee: (was: Erick Erickson) I won't get to this for 3.6 Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242-solr40.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2921) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should
[ https://issues.apache.org/jira/browse/SOLR-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2921: - Affects Version/s: (was: 3.6) Make any Filters, Tokenizers and CharFilters implement MultiTermAwareComponent if they should - Key: SOLR-2921 URL: https://issues.apache.org/jira/browse/SOLR-2921 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Environment: All Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor SOLR-2438 creates a new MultiTermAwareComponent interface. This allows Solr to automatically assemble a multiterm analyzer that does the right thing vis-a-vis transforming the individual terms of a multi-term query at query time. Examples are: lower casing, folding accents, etc. Currently (27-Nov-2011), the following classes implement MultiTermAwareComponent: * ASCIIFoldingFilterFactory * LowerCaseFilterFactory * LowerCaseTokenizerFactory * MappingCharFilterFactory * PersianCharFilterFactory When users put any of the above in their query analyzer, Solr will do the right thing at query time and the perennial question users have, why didn't my wildcard query automatically lower-case (or accent fold or) my terms? will be gone. Die question die! But taking a quick look, for instance, at the various FilterFactories that exist, there are a number of possibilities that *might* be good candidates for implementing MultiTermAwareComponent. But I really don't understand the correct behavior here well enough to know whether these should implement the interface or not. And this doesn't include other CharFilters or Tokenizers. Actually implementing the interface is often trivial, see the classes above for examples. Note that LowerCaseTokenizerFactory returns a *Filter*, which is the right thing in this case. Here is a quick cull of the Filters that, just from their names, might be candidates. If anyone wants to take any of them on, that would be great. If all you can do is provide test cases, I could probably do the code part, just let me know. ArabicNormalizationFilterFactory GreekLowerCaseFilterFactory HindiNormalizationFilterFactory ICUFoldingFilterFactory ICUNormalizer2FilterFactory ICUTransformFilterFactory IndicNormalizationFilterFactory ISOLatin1AccentFilterFactory PersianNormalizationFilterFactory RussianLowerCaseFilterFactory TurkishLowerCaseFilterFactory -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3182) If there is only one core, let it be the default without specifying a default in solr.xml
[ https://issues.apache.org/jira/browse/SOLR-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-3182: Assignee: (was: Erick Erickson) Don't have time to get to this in 3.6, does someone else want to push this forward? If there is only one core, let it be the default without specifying a default in solr.xml - Key: SOLR-3182 URL: https://issues.apache.org/jira/browse/SOLR-3182 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 3.6, 4.0 Reporter: Russell Black Priority: Minor Labels: patch Attachments: SOLR-3182-default-core.patch Original Estimate: 10m Remaining Estimate: 10m Our particular need for this is as follows. We operate in a sharded environment with one core per server. Each shard also acts as a collator. We want to use a hardware load balancer to choose which shard will do the collation for each query. But in order to do that, each server's single core would have to carry the same name so that it could be accessed by the same url across servers. However we name the cores by their shard number (query0,query1,...) because it parallels with the way we name our indexing/master cores (index0, index1,...). This naming convention also gives us the flexibility of moving to a multicore environment in the future without having to rename the cores, although admittedly that would complicate load balancing. In a system with a large number of shards and the anticipation of adding more going forward, setting a defaultCoreName attribute in each solr.xml file becomes inconvenient, especially since there is no Solr admin API for setting defaultCoreName. It would have to be done by hand or with some automated tool we would write in house. Even if there were an API, logically it seems unnecessary to have to declare the only core to be the default. Fortunately this behavior can be implemented with the following simple patch: {code} Index: solr/core/src/java/org/apache/solr/core/CoreContainer.java === --- solr/core/src/java/org/apache/solr/core/CoreContainer.java (revision 1295229) +++ solr/core/src/java/org/apache/solr/core/CoreContainer.java (working copy) @@ -870,6 +870,10 @@ } private String checkDefault(String name) { +// if there is only one core, let it be the default without specifying a default in solr.xml +if (defaultCoreName.trim().length() == 0 name.trim().length() == 0 cores.size() == 1) { + return cores.values().iterator().next().getName(); +} return name.length() == 0 || defaultCoreName.equals(name) || name.trim().length() == 0 ? : name; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233357#comment-13233357 ] Erick Erickson commented on SOLR-445: - Well, it's clear I won't get to this in the 3.6 time frame, so if someone else wants to pick it up feel free. However, I also wonder whether with 4.0 and SolrCloud we have to approach this differently to accomodate how documents are passed around there? Update Handlers abort with bad documents Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.3 Reporter: Will Johnson Assignee: Erick Erickson Fix For: 4.0 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-445) Update Handlers abort with bad documents
[ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-445: Assignee: (was: Erick Erickson) Issue Type: Improvement (was: Bug) Update Handlers abort with bad documents Key: SOLR-445 URL: https://issues.apache.org/jira/browse/SOLR-445 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Will Johnson Fix For: 4.0 Attachments: SOLR-445-3_x.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml Has anyone run into the problem of handling bad documents / failures mid batch. Ie: add doc field name=id1/field /doc doc field name=id2/field field name=myDateFieldI_AM_A_BAD_DATE/field /doc doc field name=id3/field /doc /add Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Using term offsets for hit highlighting
Yep, the first challenge is always getting the old patch(es) to apply. On Tue, Mar 20, 2012 at 4:09 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Thanks for all the offers of help! It looks as though most of the hard work has already been done, which is exactly where I like to pick up projects. :-) Maybe the best place to start would be for me to rebase the branch against trunk, and see what still fits? I think there have been some fairly major changes in the internals since July last year. On 19 Mar 2012, at 17:07, Mike Sokolov wrote: I posted a patch with a Collector somewhat similar to what you described, Alan - it's attached to one of the sub-issues https://issues.apache.org/jira/browse/LUCENE-3318. It is in a fairly complete alpha state, but has seen no production use of course, since it relies on the remainder of the unfinished work in that branch. It works by creating a TokenStream based on match positions returned from the query and passing that to the existing Highlighter. Please feel free to get in touch if you decide to look into that and have questions. -Mike On 03/19/2012 11:51 AM, Simon Willnauer wrote: On Mon, Mar 19, 2012 at 4:50 PM, Uwe Schindleru...@thetaphi.de wrote: Have you marked that for GSOC? Would be a good idea! yes I did - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Monday, March 19, 2012 4:43 PM To: dev@lucene.apache.org Subject: Re: Using term offsets for hit highlighting Alan, you made my day! The branch is kind of outdated but I looked at it lately and I can certainly help to get it up to speed. The feature in that branch is quite a big one and its in a very early stage. Still I want to encourage you to take a look and work on it. I promise all my help with the issues! let me know if you have questions! simon On Mon, Mar 19, 2012 at 3:52 PM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Cool, thanks Robert. I'll take a look at the JIRA ticket. On 19 Mar 2012, at 14:44, Robert Muir wrote: On Mon, Mar 19, 2012 at 10:38 AM, Alan Woodward alan.woodw...@romseysoftware.co.uk wrote: Hello, The project I'm currently working on requires the reporting of exact hit positions from some pretty hairy queries, not all of which are covered by the existing highlighter modules. I'm working round this by translating everything into SpanQueries, and using the getSpans() method to locate hits (I've extended the Spans interface to make term offsets available - see https://issues.apache.org/jira/browse/LUCENE-3826). This works for our use-case, but isn't terribly efficient, and obviously isn't applicable to non-Span queries. I've seen a bit of chatter on the list about using term offsets to provide accurate highlighting in Lucene. I'm going to have a couple of weeks free in April, and I thought I might have a go at implementing this. Mainly I'm wondering if there's already been thoughts about how to do it. My current thoughts are to somehow extend the Weight and Scorer interface to make term offsets available; to get highlights for a given set of documents, you'd essentially run the query again, with a filter on just the documents you want highlighted, and have a custom collector that gets the term offsets in place of the scores. Hi Alan, Simon started some initial work on https://issues.apache.org/jira/browse/LUCENE-2878 Some work and prototypes were done in a branch, but it might be lagging behind trunk a bit. Additionally at the time it was first done, I think we didn't yet support offsets in the postings lists. We've since added this and several codecs support it. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
[jira] [Updated] (SOLR-3256) Distributed search throws NPE when using fl=score
[ https://issues.apache.org/jira/browse/SOLR-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-3256: Attachment: SOLR-3256.patch It's rare, it seems to depend on the order of the fl parameters. http://localhost:8983/solr/select?q=*:*shards=localhost:8983/solrfl=idfl=catfl=price shows only the id, http://localhost:8983/solr/select?q=*:*shards=localhost:8983/solrfl=catfl=idfl=price shows id and cat and http://localhost:8983/solr/select?q=*:*shards=localhost:8983/solrfl=pricefl=catfl=id shows price and id. I'm attaching a patch that demonstrates the failure with a test case. Distributed search throws NPE when using fl=score - Key: SOLR-3256 URL: https://issues.apache.org/jira/browse/SOLR-3256 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Priority: Minor Fix For: 4.0 Attachments: SOLR-3256.patch Steps to reproduce the problem: Start two Solr instances (may use the example configuration) add some documents to both instances execute a query like: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:8984/solrq=(ipod%20OR%20display)*fl=score* Expected result: List of scores or at least an exception saying that this request is not supported (may not make too much sense to do fl=score, but a descriptive exception can help debug the problem) Getting: SEVERE: null:java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:985) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:637) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Created] (SOLR-3257) Dedupe update.chain example should include DistribtedUpdateProcessorFactory
Dedupe update.chain example should include DistribtedUpdateProcessorFactory --- Key: SOLR-3257 URL: https://issues.apache.org/jira/browse/SOLR-3257 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Priority: Trivial Fix For: 4.0 Enabling the default dedupe update processor chain breaks distributed indexing because DistributedUpdateProcessorFactory is missing in the update chain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3257) Dedupe update.chain example should include DistribtedUpdateProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-3257: Attachment: SOLR-3257-4.0-1.patch Patch for trunk adding the update processor to the chain in solrconfig. Dedupe update.chain example should include DistribtedUpdateProcessorFactory --- Key: SOLR-3257 URL: https://issues.apache.org/jira/browse/SOLR-3257 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Priority: Trivial Fix For: 4.0 Attachments: SOLR-3257-4.0-1.patch Enabling the default dedupe update processor chain breaks distributed indexing because DistributedUpdateProcessorFactory is missing in the update chain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3073) Distributed Grouping fails if the uniqueKey is a UUID
[ https://issues.apache.org/jira/browse/SOLR-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen resolved SOLR-3073. - Resolution: Fixed Actual error is fixed, so this issue is resolved. Distributed Grouping fails if the uniqueKey is a UUID - Key: SOLR-3073 URL: https://issues.apache.org/jira/browse/SOLR-3073 Project: Solr Issue Type: Bug Affects Versions: 3.5, 4.0 Reporter: Devon Krisman Assignee: Martijn van Groningen Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-3073-3x.patch, SOLR-3073-3x.patch Attempting to use distributed grouping (using a StrField as the group.fieldname) with a UUID as the uniqueKey results in an error because the classname (java.util.UUID) is prepended to the field value during the second phase of the grouping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3256) Distributed search throws NPE when using fl=score
[ https://issues.apache.org/jira/browse/SOLR-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233381#comment-13233381 ] Luca Cavanna commented on SOLR-3256: Regarding the legacy behavior fl=score which was equals to fl=*,score : it has been removed from trunk a few weeks ago (SOLR-2712). Distributed search throws NPE when using fl=score - Key: SOLR-3256 URL: https://issues.apache.org/jira/browse/SOLR-3256 Project: Solr Issue Type: Bug Reporter: Tomás Fernández Löbbe Priority: Minor Fix For: 4.0 Attachments: SOLR-3256.patch Steps to reproduce the problem: Start two Solr instances (may use the example configuration) add some documents to both instances execute a query like: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:8984/solrq=(ipod%20OR%20display)*fl=score* Expected result: List of scores or at least an exception saying that this request is not supported (may not make too much sense to do fl=score, but a descriptive exception can help debug the problem) Getting: SEVERE: null:java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:985) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:637) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:612) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:307) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Thread.java:636) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2747) Include formatted Changes.html for release
[ https://issues.apache.org/jira/browse/SOLR-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2747: Fix Version/s: (was: 3.6) Removed 3.6 version. Include formatted Changes.html for release -- Key: SOLR-2747 URL: https://issues.apache.org/jira/browse/SOLR-2747 Project: Solr Issue Type: Improvement Reporter: Martijn van Groningen Priority: Minor Fix For: 4.0 Just like when releasing Lucene, Solr should also have a html formatted changes file. The Lucene Perl script (lucene/src/site/changes/changes2html.pl) should be reused. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2712) Deprecate fl=score behavior.
[ https://issues.apache.org/jira/browse/SOLR-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233390#comment-13233390 ] Mark Miller commented on SOLR-2712: --- There was something missed here. I'll fix it when I fix the multiple fl's not being treated right in distrib search - around that same code there is still logic that expects this. Deprecate fl=score behavior. -- Key: SOLR-2712 URL: https://issues.apache.org/jira/browse/SOLR-2712 Project: Solr Issue Type: Task Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 3.6, 4.0 SOLR-2657 points out that all fields show up when you request score and something becides a 'normal' field. To support the strange behavior and avoid it when uncenessary we have this: {code:java} if( fields.size() == 1 _wantsScore augmenters.size() == 1 globs.isEmpty() ) { _wantsAllFields = true; } {code} I suggest we advertise in 3.x that expecting *fl=score* to return all fields is deprecated, and remove this bit of crazy code from 4.x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2725) TieredMergePolicy and expungeDeletes behaviour
[ https://issues.apache.org/jira/browse/SOLR-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2725: Affects Version/s: 3.6 3.4 3.5 Fix Version/s: (was: 3.6) Removed 3.6 from fix versions. TieredMergePolicy and expungeDeletes behaviour -- Key: SOLR-2725 URL: https://issues.apache.org/jira/browse/SOLR-2725 Project: Solr Issue Type: Bug Affects Versions: 3.3, 3.4, 3.5, 3.6 Reporter: Martijn van Groningen Fix For: 4.0 During executing a commit with expungeDeletes I noticed there were still a lot of segments left. However there were still ~30 segments left with deletes after the commit finished. After looking in SolrIndexConfig class I noticed that TieredMergePolicy#setExpungeDeletesPctAllowed isn't invoked. I think the following statements in SolrIndexConfig#buildMergePolicy method will purge all deletes: {code} tieredMergePolicy.setExpungeDeletesPctAllowed(0); {code} This also reflects the behavior of Solr 3.1 / 3.2 After some discussion on IRC setting expungeDeletesPctAllowed always to zero isn't best for performance: http://colabti.org/irclogger/irclogger_log/lucene-dev?date=2011-08-20#l120 I think we should add an option to solrconfig.xml that allows users to set this option to whatever value is best for them: {code:xml} expungeDeletesPctAllowed0/expungeDeletesPctAllowed {code} Also having a expungeDeletesPctAllowed per commit command would be great: {code:xml} commit waitFlush=false waitSearcher=false expungeDeletes=true expungeDeletesPctAllowed=0/ {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) ... 1 more Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:109) at
[jira] [Resolved] (SOLR-2764) Create a NorwegianLightStemmer and NorwegianMinimalStemmer
[ https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-2764. --- Resolution: Fixed Committed to trunk and branch_3x Create a NorwegianLightStemmer and NorwegianMinimalStemmer -- Key: SOLR-2764 URL: https://issues.apache.org/jira/browse/SOLR-2764 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Jan Høydahl Assignee: Jan Høydahl Fix For: 3.6, 4.0 Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch We need a simple light-weight stemmer and a minimal stemmer for plural/singlular only in Norwegian -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-3258: -- Attachment: debugging.patch I once tried to debug it but couldn't reproduce. It does happen from time to time on my build machine though. '60' is ASCII for '' so I guess it's something weird emitted. Can you apply the attached patch and try to cause this, Markus? Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 Attachments: debugging.patch In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at
[jira] [Created] (SOLR-3259) Solr 4 aesthetics
Solr 4 aesthetics - Key: SOLR-3259 URL: https://issues.apache.org/jira/browse/SOLR-3259 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 4.0 Solr 4 will be a huge new release... we should take this opportunity to improve the out-of-the-box experience. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3259) Solr 4 aesthetics
[ https://issues.apache.org/jira/browse/SOLR-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233402#comment-13233402 ] Yonik Seeley commented on SOLR-3259: Some ideas: - our fieldType list has grown *huge*... we should probably move the field list to the top of the file where it's easier to find - the preference for JSON over XML seems to be continuing - we should make things more JSON oriented by adding a /query handler that defaults to wt=json and perhaps indent=true - the concept of an example server that you must configure yourself has become less than ideal... perhaps we should just create a server directory (but leave things like exampledocs under example) - some new JSON based example docs that aren't based on electronics from '05 (or as an alternative for certain quickstart guides, start off with a curl command to add some data rather than trying to shove it all in exampledocs) Solr 4 aesthetics - Key: SOLR-3259 URL: https://issues.apache.org/jira/browse/SOLR-3259 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 4.0 Solr 4 will be a huge new release... we should take this opportunity to improve the out-of-the-box experience. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233405#comment-13233405 ] Markus Jelsma commented on SOLR-3258: - I've redeployed with your patch. This is very peculiar indeed! The stack trace shows a ping failing and at the bottom some that work well. I've also noticed the /select handler not being there so i've did a manual request on /select?q=*:* and i _sometimes_ get the same error. Some work, some don't. Does this help a bit? {code} 2012-03-20 13:45:30,352 INFO [solr.core.SolrCore] - [http-80-17] - : [] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:45:30,352 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-17] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format, input: htmlheadtitleApache Tomcat/6.0.35 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 404 - /solr/select/h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b u/solr/select/u/ppbdescription/b uThe requested resource (/solr/select) is not available./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.35/h3/body/html at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format, input: htmlheadtitleApache Tomcat/6.0.35 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 404 - /solr/select/h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b u/solr/select/u/ppbdescription/b uThe requested resource (/solr/select) is not available./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.35/h3/body/html at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at
[jira] [Commented] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233408#comment-13233408 ] Dawid Weiss commented on SOLR-3258: --- And here comes the moment where my knowledge of Solr ends :) I'd say there is definitely a bug in improper handling of HTTP response status (and this should be fixed), unless there is a filter somewhere that emits this HTML and fakes HTTP 200... But as for the cause of why this happens in general -- no idea. Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 Attachments: debugging.patch In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at
Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 2022 - Failure
I opened https://issues.apache.org/jira/browse/LUCENE-3890 for this... Mike McCandless http://blog.mikemccandless.com On Mon, Mar 19, 2012 at 8:23 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2022/ 1 tests failed. REGRESSION: org.apache.lucene.search.grouping.GroupFacetCollectorTest.testRandom Error Message: null Stack Trace: java.lang.NullPointerException at org.apache.lucene.search.grouping.term.TermGroupFacetCollector$MV.setNextReader(TermGroupFacetCollector.java:249) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:505) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.lucene.search.grouping.GroupFacetCollectorTest.testRandom(GroupFacetCollectorTest.java:259) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.apache.lucene.util.LuceneTestCase$SubclassSetupTeardownRule$1.evaluate(LuceneTestCase.java:729) at org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:645) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) at org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:556) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:51) at org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:618) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:164) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.apache.lucene.util.UncaughtExceptionsRule$1.evaluate(UncaughtExceptionsRule.java:51) at org.apache.lucene.util.StoreClassNameRule$1.evaluate(StoreClassNameRule.java:21) at org.apache.lucene.util.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:22) at org.junit.rules.RunRules.evaluate(RunRules.java:18) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768) Build Log (for compile errors): [...truncated 5557 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3890) GroupFacetCollectorTest nightly build failure
GroupFacetCollectorTest nightly build failure - Key: LUCENE-3890 URL: https://issues.apache.org/jira/browse/LUCENE-3890 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 4.0 Failure from nightly build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2022/testReport/junit/org.apache.lucene.search.grouping/GroupFacetCollectorTest/testRandom/ It reproduces for me with: {noformat} ant test -Dtestcase=GroupFacetCollectorTest -Dtestmethod=testRandom -Dtests.seed=7d227aa075b7bfb8:550d2a0828ce2537:-3553c99f6a4d293e -Dtests.multiplier=3 -Dargs=-Dfile.encoding=US-ASCII {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233412#comment-13233412 ] Markus Jelsma commented on SOLR-3258: - Nasty! With this issue indexing fails, although some documents seem to be added. Custom request handlers still work but the default /select handler gives the trouble, which is used by our ping handler. Manual requests to /select?distrib=false do work without trouble. I also know that this happens with an empty index. I'd love to provide more details but i haven't. For now the issue is here but it just might disappear as suddenly as it appeared. Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 Attachments: debugging.patch In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at
[jira] [Updated] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-3258: Attachment: zkdump.txt Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 Attachments: debugging.patch, zkdump.txt In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at
[jira] [Commented] (SOLR-3259) Solr 4 aesthetics
[ https://issues.apache.org/jira/browse/SOLR-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233414#comment-13233414 ] Jan Høydahl commented on SOLR-3259: --- +1 to the general idea of lifting the first-time experience of Solr. I like all your proposals except... I'm not sure if we gain much by moving the example to a server folder. I think it's a Good Thing™ that we make it clear that what's provided is just an example, not for production. Another name for the example folder could be jetty, because that's what it really is - which many are confused by today, they think that the lib and etc folders below example belong to Solr... If anything I'd vote for making the distro closer to what people would want in production. You could then have a pure solr/jetty folder with ONLY jetty, a solr/example-home folder which holds todays example/solr making it more obvious what folder is actually the SOLR_HOME, and finally a start script on top level, start-solr.[cmd|sh], which copies the war from dist to jetty/webapps, sets -Dsolr.solr.home and starts Jetty. By default start-solr.sh would log to stdout, but a param could have it log to file. Solr 4 aesthetics - Key: SOLR-3259 URL: https://issues.apache.org/jira/browse/SOLR-3259 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 4.0 Solr 4 will be a huge new release... we should take this opportunity to improve the out-of-the-box experience. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3890) GroupFacetCollectorTest nightly build failure
[ https://issues.apache.org/jira/browse/LUCENE-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233415#comment-13233415 ] Martijn van Groningen commented on LUCENE-3890: --- Thanks for noticing this! I'll take a look at it. GroupFacetCollectorTest nightly build failure - Key: LUCENE-3890 URL: https://issues.apache.org/jira/browse/LUCENE-3890 Project: Lucene - Java Issue Type: Bug Reporter: Michael McCandless Fix For: 4.0 Failure from nightly build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/2022/testReport/junit/org.apache.lucene.search.grouping/GroupFacetCollectorTest/testRandom/ It reproduces for me with: {noformat} ant test -Dtestcase=GroupFacetCollectorTest -Dtestmethod=testRandom -Dtests.seed=7d227aa075b7bfb8:550d2a0828ce2537:-3553c99f6a4d293e -Dtests.multiplier=3 -Dargs=-Dfile.encoding=US-ASCII {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3846) Fuzzy suggester
[ https://issues.apache.org/jira/browse/LUCENE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3846: --- Fix Version/s: (was: 3.6) Fuzzy suggester --- Key: LUCENE-3846 URL: https://issues.apache.org/jira/browse/LUCENE-3846 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3846.patch, LUCENE-3846.patch Would be nice to have a suggester that can handle some fuzziness (like spell correction) so that it's able to suggest completions that are near what you typed. As a first go at this, I implemented 1T (ie up to 1 edit, including a transposition), except the first letter must be correct. But there is a penalty, ie, the corrected suggestion needs to have a much higher freq than the exact match suggestion before it can compete. Still tons of nocommits, and somehow we should merge this / make it work with analyzing suggester too (LUCENE-3842). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3564) rename IndexWriter.rollback to .rollbackAndClose
[ https://issues.apache.org/jira/browse/LUCENE-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3564: --- Fix Version/s: (was: 3.6) rename IndexWriter.rollback to .rollbackAndClose Key: LUCENE-3564 URL: https://issues.apache.org/jira/browse/LUCENE-3564 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Spinoff from LUCENE-3454, where Shai noticed that rollback is trappy since it [unexpected] closes the IW. I think we should rename it to rollbackAndClose. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233420#comment-13233420 ] Markus Jelsma commented on SOLR-3258: - I suspected Solr's distributed capabilities because of the error occuring with distrib=true. So i stopped Zookeeper, removed the data directory and restarted Zookeeper and the Solr nodes. I attached a zookeeper dump i took just moments before removing the data directory. Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 Attachments: debugging.patch, zkdump.txt In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at
[jira] [Updated] (LUCENE-2686) DisjunctionSumScorer should not call .score on sub scorers until consumer calls .score
[ https://issues.apache.org/jira/browse/LUCENE-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2686: --- Fix Version/s: (was: 3.6) DisjunctionSumScorer should not call .score on sub scorers until consumer calls .score -- Key: LUCENE-2686 URL: https://issues.apache.org/jira/browse/LUCENE-2686 Project: Lucene - Java Issue Type: Bug Components: core/search Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2686.patch, LUCENE-2686.patch, Test2LUCENE2590.java Spinoff from java-user thread question about Scorer.freq() from Koji... BooleanScorer2 uses DisjunctionSumScorer to score only-SHOULD-clause boolean queries. But, this scorer does too much work for collectors that never call .score, because it scores while it's matching. It should only call .score on the subs when the caller calls its .score. This also has the side effect of messing up advanced collectors that gather the freq() of the subs (using LUCENE-2590). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3220) RecoveryZkTest test failure
[ https://issues.apache.org/jira/browse/SOLR-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233423#comment-13233423 ] Markus Jelsma commented on SOLR-3220: - In SOLR-3258 i included a zkdump file _while_ this issue was occurring. The problem vanished after removing the Zookeeper data directory and restarting. So i hope someone can find useful information in the dump file. RecoveryZkTest test failure --- Key: SOLR-3220 URL: https://issues.apache.org/jira/browse/SOLR-3220 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: TEST-org.apache.solr.cloud.RecoveryZkTest.xml observed a failure in RecoveryZkTest.testDistribSearch using r1298661 that had some odd looking (to me) log info. could not reproduce with identical seed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233426#comment-13233426 ] Yonik Seeley commented on SOLR-3258: bq. I suspected Solr's distributed capabilities because of the error occuring with distrib=true. I was going to ask... do you mean for the ping query to be distributed? Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 Attachments: debugging.patch, zkdump.txt In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
[jira] [Commented] (SOLR-3220) RecoveryZkTest test failure
[ https://issues.apache.org/jira/browse/SOLR-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233431#comment-13233431 ] Mark Miller commented on SOLR-3220: --- Sorry - missed these issues going by - been busy with other things for a bit. Yeah, I've seen this before. It happens when an error is returned rather than a java bin response. I've seen it with 404's, sure it happens with other errors at the container level. The '60' is the start of the html (if I remember right) error response. It makes debugging a bitch sometimes. So for instance, for the ping handler, perhaps it wasn't found, or it errored. For this, it could be an issue during startup or shutdown when a 404 can be returned. We should make a new issue for the problem - offhand I don't have a solution though. Adding structure error support to Solr might help. For this issue I first need to see if that exception even relates to the failure - there is a good chance it does not. A server is stopped and started in this test, and a query or update at the wrong time can return a 404 or some other non success code. So you are likely to see this exception even if the test passes. RecoveryZkTest test failure --- Key: SOLR-3220 URL: https://issues.apache.org/jira/browse/SOLR-3220 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: TEST-org.apache.solr.cloud.RecoveryZkTest.xml observed a failure in RecoveryZkTest.testDistribSearch using r1298661 that had some odd looking (to me) log info. could not reproduce with identical seed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2020) HttpComponentsSolrServer
[ https://issues.apache.org/jira/browse/SOLR-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233434#comment-13233434 ] Sami Siren commented on SOLR-2020: -- bq. I assume this means we'll be able to switch to using NIO for the distributed search sub-requests! Yeah, that should be possible. HttpComponentsSolrServer Key: SOLR-2020 URL: https://issues.apache.org/jira/browse/SOLR-2020 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4.1 Environment: Any Reporter: Chantal Ackermann Priority: Minor Fix For: 4.0 Attachments: HttpComponentsSolrServer.java, HttpComponentsSolrServerTest.java, SOLR-2020-HttpSolrServer.patch, SOLR-2020.patch, SOLR-2020.patch Implementation of SolrServer that uses the Apache Http Components framework. Http Components (http://hc.apache.org/) is the successor of Commons HttpClient and thus HttpComponentsSolrServer would be a successor of CommonsHttpSolrServer, in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233436#comment-13233436 ] Markus Jelsma commented on SOLR-3258: - It seems it is. The ping query is just a /select?q=*:*rows=0 but it yields different results with distrib=false specified. Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 Attachments: debugging.patch, zkdump.txt In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
[jira] [Commented] (SOLR-3220) RecoveryZkTest test failure
[ https://issues.apache.org/jira/browse/SOLR-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233441#comment-13233441 ] Mark Miller commented on SOLR-3220: --- Also, as this and a couple other classes are expected to throw various nasty exceptions, I had them all ignored in the base class - but yonik unignored at some point when he was debugging. I think we should turn that ignore back on - the ant test output is a mess otherwise. RecoveryZkTest test failure --- Key: SOLR-3220 URL: https://issues.apache.org/jira/browse/SOLR-3220 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: TEST-org.apache.solr.cloud.RecoveryZkTest.xml observed a failure in RecoveryZkTest.testDistribSearch using r1298661 that had some odd looking (to me) log info. could not reproduce with identical seed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1052) Deprecate/Remove indexDefaults and mainIndex in favor of indexConfig in solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233446#comment-13233446 ] Jan Høydahl commented on SOLR-1052: --- Last call before commit to branch_3x. Speak now or be forever silent :) Deprecate/Remove indexDefaults and mainIndex in favor of indexConfig in solrconfig.xml Key: SOLR-1052 URL: https://issues.apache.org/jira/browse/SOLR-1052 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Jan Høydahl Labels: solrconfig.xml Fix For: 3.6, 4.0 Attachments: SOLR-1052-3x.patch, SOLR-1052-3x.patch, SOLR-1052-3x.patch, SOLR-1052-3x.patch Given that we now handle multiple cores via the solr.xml and the discussion around indexDefaults and mainIndex at http://www.lucidimagination.com/search/p:solr?q=mainIndex+vs.+indexDefaults We should deprecate old indexDefaults and mainIndex sections and only use a new indexConfig section. 3.6: Deprecation warning if old section used 4.0: If LuceneMatchVersion before LUCENE_40 then warn (so old configs will work), else fail fast -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (SOLR-3259) Solr 4 aesthetics
+1 to all folder suggestions Bill Bell Sent from mobile On Mar 20, 2012, at 8:07 AM, Jan Høydahl (Commented) (JIRA)j...@apache.org wrote: [ https://issues.apache.org/jira/browse/SOLR-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233414#comment-13233414 ] Jan Høydahl commented on SOLR-3259: --- +1 to the general idea of lifting the first-time experience of Solr. I like all your proposals except... I'm not sure if we gain much by moving the example to a server folder. I think it's a Good Thing™ that we make it clear that what's provided is just an example, not for production. Another name for the example folder could be jetty, because that's what it really is - which many are confused by today, they think that the lib and etc folders below example belong to Solr... If anything I'd vote for making the distro closer to what people would want in production. You could then have a pure solr/jetty folder with ONLY jetty, a solr/example-home folder which holds todays example/solr making it more obvious what folder is actually the SOLR_HOME, and finally a start script on top level, start-solr.[cmd|sh], which copies the war from dist to jetty/webapps, sets -Dsolr.solr.home and starts Jetty. By default start-solr.sh would log to stdout, but a param could have it log to file. Solr 4 aesthetics - Key: SOLR-3259 URL: https://issues.apache.org/jira/browse/SOLR-3259 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 4.0 Solr 4 will be a huge new release... we should take this opportunity to improve the out-of-the-box experience. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3830) MappingCharFilter could be improved by switching to an FST.
[ https://issues.apache.org/jira/browse/LUCENE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3830: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) MappingCharFilter could be improved by switching to an FST. --- Key: LUCENE-3830 URL: https://issues.apache.org/jira/browse/LUCENE-3830 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 MappingCharFilter stores an overly complex tree-like structure for matching input patterns. The input is a union of fixed strings mapped to a set of fixed strings; an fst matcher would be ideal here and provide both memory and speed improvement I bet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3891) Documents loaded at search time (IndexReader.document) should be a different class from the index-time Document
Documents loaded at search time (IndexReader.document) should be a different class from the index-time Document --- Key: LUCENE-3891 URL: https://issues.apache.org/jira/browse/LUCENE-3891 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless The fact that the Document you can load at search time is the same Document class you had indexed is horribly trappy in Lucene, because, the loaded document necessarily loses information like field boost, whether a field was tokenized, etc. (See LUCENE-3854 for a recent example). We should fix this, statically, so that it's an entirely different class at search time vs index time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3891) Documents loaded at search time (IndexReader.document) should be a different class from the index-time Document
[ https://issues.apache.org/jira/browse/LUCENE-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3891: --- Fix Version/s: 4.0 Labels: gsoc2012 lucene-gsoc-12 (was: ) Documents loaded at search time (IndexReader.document) should be a different class from the index-time Document --- Key: LUCENE-3891 URL: https://issues.apache.org/jira/browse/LUCENE-3891 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 The fact that the Document you can load at search time is the same Document class you had indexed is horribly trappy in Lucene, because, the loaded document necessarily loses information like field boost, whether a field was tokenized, etc. (See LUCENE-3854 for a recent example). We should fix this, statically, so that it's an entirely different class at search time vs index time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3258) Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format
[ https://issues.apache.org/jira/browse/SOLR-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233451#comment-13233451 ] Markus Jelsma commented on SOLR-3258: - It seems i found the problem. The solr.xml file on one of the nodes received a typo. Instead shard=shard1 i had shard_a=shard1. It's pretty hard to reproduce but after removing the ZK data directories you can start the nodes with one core having a bad shard parameter. Originally only one node had a corrupt solr.xml file but i could only reproduce by corrupting the file on both nodes and starting Solr. Ping query caused exception..Invalid version (expected 2, but 60) or the data in not in 'javabin' format Key: SOLR-3258 URL: https://issues.apache.org/jira/browse/SOLR-3258 Project: Solr Issue Type: Bug Environment: solr-impl 4.0-SNAPSHOT 1302403 - markus - 2012-03-19 13:55:51 Reporter: Markus Jelsma Fix For: 4.0 Attachments: debugging.patch, zkdump.txt In a test set-up with nodes=2, shards=3 and cores=6 we often see this exception in the logs. Once every few ping requests this is thrown, other request return a proper OK. Ping request handler: {code} requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qtselect/str str name=q*:*/str int name=rows0/int /lst lst name=defaults str name=wtjson/str str name=echoParamsall/str bool name=omitHeadertrue/bool /lst /requestHandler {code} Exception: {code} 2012-03-20 13:16:06,405 INFO [solr.core.SolrCore] - [http-80-18] - : [core_a] webapp=/solr path=/admin/ping params={} status=500 QTime=7 2012-03-20 13:16:06,406 ERROR [solr.servlet.SolrDispatchFilter] - [http-80-18] - : null:org.apache.solr.common.SolrException: Ping query caused exception: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:77) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:68) ... 16 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:278) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:158) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233453#comment-13233453 ] Antoine Le Floc'h commented on SOLR-2242: - Bill, Just a thought, how are you going to plug in [SOLR-3134|https://issues.apache.org/jira/browse/SOLR-3134] then ? Since we are not able to aggregate distinct count over shards, shouldn't you do something like: {code} lst name=facet_numTerms lst name=localhost:/solr int name=cat15/int int name=price14/int /lst lst name=localhost:/solr int name=cat3/int int name=price23/int /lst /lst {code} Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: NumFacetTermsFacetsTest.java, SOLR-2242-notworkingtest.patch, SOLR-2242-solr40.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.shard.patch, SOLR-2242.shard.patch, SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=2facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=0facet.limit=-1facet.field=price http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numFacetTerms=1facet.limit=-1facet.field=price This currently only works on facet.field. {code} lst name=facet_fields lst name=price int name=numFacetTerms14/int int name=0.03/intint name=11.51/intint name=19.951/intint name=74.991/intint name=92.01/intint name=179.991/intint name=185.01/intint name=279.951/intint name=329.951/intint name=350.01/intint name=399.01/intint name=479.951/intint name=649.991/intint name=2199.01/int /lst /lst {code} Several people use this to get the group.field count (the # of groups). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3729) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED
[ https://issues.apache.org/jira/browse/LUCENE-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3729: --- Labels: gsoc2012 lucene-gsoc-11 (was: ) Allow using FST to hold terms data in DocValues.BYTES_*_SORTED -- Key: LUCENE-3729 URL: https://issues.apache.org/jira/browse/LUCENE-3729 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2012, lucene-gsoc-11 Attachments: LUCENE-3729.patch, LUCENE-3729.patch, LUCENE-3729.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3220) RecoveryZkTest test failure
[ https://issues.apache.org/jira/browse/SOLR-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233455#comment-13233455 ] Markus Jelsma commented on SOLR-3220: - In my case i had a typo in a solr.xml file on one node: shard_a=shard1 was specified for one of three cores. Fixing and removing ZK data directories solved the issue. RecoveryZkTest test failure --- Key: SOLR-3220 URL: https://issues.apache.org/jira/browse/SOLR-3220 Project: Solr Issue Type: Bug Reporter: Hoss Man Attachments: TEST-org.apache.solr.cloud.RecoveryZkTest.xml observed a failure in RecoveryZkTest.testDistribSearch using r1298661 that had some odd looking (to me) log info. could not reproduce with identical seed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Maven artifacts not working?
This link seems to not work: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3514) deep paging with Sort
[ https://issues.apache.org/jira/browse/LUCENE-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3514: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) deep paging with Sort - Key: LUCENE-3514 URL: https://issues.apache.org/jira/browse/LUCENE-3514 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.4, 4.0 Reporter: Robert Muir Labels: gsoc2012, lucene-gsoc-12 We added IS.searchAfter(Query, Filter) but we don't support Sort yet with this API. I think it might be overkill at least at first to try to implement 12 collector variants for this. I put the following idea on SOLR-1726: One idea would be to start with one or two implementations (maybe in/out of order) for the sorting case, and dont overspecialize it yet. * for page 1, the ScoreDoc (FieldDoc really) will be null, so we just return the normal impl anyway. * even if our searchAfter isnt huper-duper fast, the user can always make the tradeoff like with page-by-score. they can always just pass null until like page 10 or something if they compute that it only starts to 'help' then. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3475) ShingleFilter should handle positionIncrement of zero, e.g. synonyms
[ https://issues.apache.org/jira/browse/LUCENE-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3475: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) I think this is important, now that we have graph analyzers (like Kuromoji). So ShingleFilter should pay attention to posInc as well as posLength... ShingleFilter should handle positionIncrement of zero, e.g. synonyms Key: LUCENE-3475 URL: https://issues.apache.org/jira/browse/LUCENE-3475 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Affects Versions: 3.4 Reporter: Cameron Priority: Minor Labels: gsoc2012, lucene-gsoc-12 ShingleFilter is creating shingles for a single term that has been expanded by synonyms when it shouldn't. The position increment is 0. As an example, I have an Analyzer with a SynonymFilter followed by a ShingleFilter. Assuming car and auto are synonyms, the SynonymFilter produces two tokens and position 1: car, auto. The ShingleFilter is then producing 3 tokens, when there should only be two: car, car auto, auto. This behavior seems incorrect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3214) If you use multiple fl entries rather than a comma separated list, all but the first entry can be ignored if you are using distributed search.
[ https://issues.apache.org/jira/browse/SOLR-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233461#comment-13233461 ] Mark Miller commented on SOLR-3214: --- bq. It appears that currently, score is synonymous with *,score is just not true currently This was recently changed by SOLR-2712 - this part of it just was missed. If you use multiple fl entries rather than a comma separated list, all but the first entry can be ignored if you are using distributed search. -- Key: SOLR-3214 URL: https://issues.apache.org/jira/browse/SOLR-3214 Project: Solr Issue Type: Bug Components: search Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0 I have not checked yet, but prob in 3.x too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3422) IndeIndexWriter.optimize() throws FileNotFoundException and IOException
[ https://issues.apache.org/jira/browse/LUCENE-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3422. Resolution: Incomplete IndeIndexWriter.optimize() throws FileNotFoundException and IOException --- Key: LUCENE-3422 URL: https://issues.apache.org/jira/browse/LUCENE-3422 Project: Lucene - Java Issue Type: Bug Reporter: Elizabeth Nisha I am using lucene 3.0.2 search APIs for my application. Indexed data is about 350MB and time taken for indexing is 25 hrs. Search indexing and Optimization runs in two different threads. Optimization runs for every 1 hour and it doesn't run while indexing is going on and vice versa. When optimization is going on using IndexWriter.optimize(), FileNotFoundException and IOException are seen in my log and the index file is getting corrupted, log says 1. java.io.IOException: No sub-file with id _5r8.fdt found [The file name in this message changes over time (_5r8.fdt, _6fa.fdt, _6uh.fdt, ..., _emv.fdt) ] 2. java.io.FileNotFoundException: /local/groups/necim/index_5.3/index/_bdx.cfs (No such file or directory) 3. java.io.FileNotFoundException: /local/groups/necim/index_5.3/index/_hkq.cfs (No such file or directory) Stack trace: java.io.IOException: background merge hit exception: _hkp:c100-_hkp _hkq:c100-_hkp _hkr:c100-_hkr _hks:c100-_hkr _hxb:c5500 _hx5:c1000 _hxc:c198 84 into _hxd [optimize] [mergeDocStores] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2359) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2298) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2268) at com.telelogic.cs.search.SearchIndex.doOptimize(SearchIndex.java:130) at com.telelogic.cs.search.SearchIndexerThread$1.run(SearchIndexerThread.java:337) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: /local/groups/necim/index_5.3/index/_hkq.cfs (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.init(SimpleFSDirectory.java:76) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.init(SimpleFSDirectory.java:97) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.init(NIOFSDirectory.java:87) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:67) at org.apache.lucene.index.CompoundFileReader.init(CompoundFileReader.java:67) at org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:114) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:590) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:616) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4309) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3965) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:231) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:288) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3883) Analysis for Irish
[ https://issues.apache.org/jira/browse/LUCENE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233465#comment-13233465 ] Jim Regan commented on LUCENE-3883: --- Wow! Thanks Robert! There isn't usually a hyphen with 'h' before a vowel, but I've started to see it recently -- there are no native Irish words beginning with 'h', so it used to be relatively unambiguous that a 'h' was a mutation, but with an increase of scientific literature in Irish, there are more Greek and Latin loan words being added which do begin with 'h', so it's no longer clear. Analysis for Irish -- Key: LUCENE-3883 URL: https://issues.apache.org/jira/browse/LUCENE-3883 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Jim Regan Priority: Trivial Labels: analysis, newbie Attachments: LUCENE-3883.patch, LUCENE-3883.patch, irish.sbl Adds analysis for Irish. The stemmer is generated from a snowball stemmer. I've sent it to Martin Porter, who says it will be added during the week. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3333) Specialize DisjunctionScorer if all clauses are TermQueries
[ https://issues.apache.org/jira/browse/LUCENE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) Specialize DisjunctionScorer if all clauses are TermQueries --- Key: LUCENE- URL: https://issues.apache.org/jira/browse/LUCENE- Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 4.0 Reporter: Simon Willnauer Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 spinnoff from LUCENE-3328 - since we have a specialized conjunction scorer we should also investigate if this pays off in disjunction scoring -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module
[ https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3272. Resolution: Fixed Fix Version/s: 4.0 Consolidate Lucene's QueryParsers into a module --- Key: LUCENE-3272 URL: https://issues.apache.org/jira/browse/LUCENE-3272 Project: Lucene - Java Issue Type: Improvement Components: modules/queryparser Reporter: Chris Male Fix For: 4.0 Lucene has a lot of QueryParsers and we should have them all in a single consistent place. The following are QueryParsers I can find that warrant moving to the new module: - Lucene Core's QueryParser - AnalyzingQueryParser - ComplexPhraseQueryParser - ExtendableQueryParser - Surround's QueryParser - PrecedenceQueryParser - StandardQueryParser - XML-Query-Parser's CoreParser All seem to do a good job at their kind of parsing with extensive tests. One challenge of consolidating these is that many tests use Lucene Core's QueryParser. One option is to just replicate this class in src/test and call it TestingQueryParser. Another option is to convert all tests over to programmatically building their queries (seems like alot of work). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3312: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3883) Analysis for Irish
[ https://issues.apache.org/jira/browse/LUCENE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233470#comment-13233470 ] Robert Muir commented on LUCENE-3883: - Thanks Jim. Personally I think this patch is ready to be committed. I'm just going to wait a bit in case you get any feedback from Martin or other snowball developers, but I won't wait too long :) Analysis for Irish -- Key: LUCENE-3883 URL: https://issues.apache.org/jira/browse/LUCENE-3883 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Jim Regan Priority: Trivial Labels: analysis, newbie Attachments: LUCENE-3883.patch, LUCENE-3883.patch, irish.sbl Adds analysis for Irish. The stemmer is generated from a snowball stemmer. I've sent it to Martin Porter, who says it will be added during the week. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3883) Analysis for Irish
[ https://issues.apache.org/jira/browse/LUCENE-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-3883: --- Assignee: Robert Muir Analysis for Irish -- Key: LUCENE-3883 URL: https://issues.apache.org/jira/browse/LUCENE-3883 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Reporter: Jim Regan Assignee: Robert Muir Priority: Trivial Labels: analysis, newbie Attachments: LUCENE-3883.patch, LUCENE-3883.patch, irish.sbl Adds analysis for Irish. The stemmer is generated from a snowball stemmer. I've sent it to Martin Porter, who says it will be added during the week. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3178) Native MMapDir
[ https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3178: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) Native MMapDir -- Key: LUCENE-3178 URL: https://issues.apache.org/jira/browse/LUCENE-3178 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Spinoff from LUCENE-2793. Just like we will create native Dir impl (UnixDirectory) to pass the right OS level IO flags depending on the IOContext, we could in theory do something similar with MMapDir. The problem is MMap is apparently quite hairy... and to pass the flags the native code would need to invoke mmap (I think?), unlike UnixDir where the code only has to open the file handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3177) Decouple indexer from Document/Field impls
[ https://issues.apache.org/jira/browse/LUCENE-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3177. Resolution: Fixed Decouple indexer from Document/Field impls -- Key: LUCENE-3177 URL: https://issues.apache.org/jira/browse/LUCENE-3177 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3177.patch, LUCENE-3177.patch I think we should define minimal iterator interfaces, IndexableDocument/Field, that indexer requires to index documents. Indexer would consume only these bare minimum interfaces, not the concrete Document/Field/FieldType classes from oal.document package. Then, the Document/Field/FieldType hierarchy is one concrete impl of these interfaces. Apps are free to make their own impls as well. Maybe eventually we make another impl that enforces a global schema, eg factored out of Solr's impl. I think this frees design pressure on our Document/Field/FieldType hierarchy, ie, these classes are free to become concrete fully-featured user-space classes with all sorts of friendly sugar APIs for adding/removing fields, getting/setting values, types, etc., but they don't need substantial extensibility/hierarchy. Ie, the extensibility point shifts to IndexableDocument/Field interface. I think this means we can collapse the three classes we now have for a Field (Fieldable/AbstracField/Field) down to a single concrete class (well, except for LUCENE-2308 where we want to break out dedicated classes for different field types...). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2764) Create a NorwegianLightStemmer and NorwegianMinimalStemmer
[ https://issues.apache.org/jira/browse/SOLR-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233472#comment-13233472 ] Robert Muir commented on SOLR-2764: --- Very nice work Jan! Create a NorwegianLightStemmer and NorwegianMinimalStemmer -- Key: SOLR-2764 URL: https://issues.apache.org/jira/browse/SOLR-2764 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Jan Høydahl Assignee: Jan Høydahl Fix For: 3.6, 4.0 Attachments: SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch, SOLR-2764.patch We need a simple light-weight stemmer and a minimal stemmer for plural/singlular only in Norwegian -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3122) Cascaded grouping
[ https://issues.apache.org/jira/browse/LUCENE-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3122: --- Fix Version/s: (was: 3.6) Labels: gsoc2012 lucene-gsoc-12 (was: ) Cascaded grouping - Key: LUCENE-3122 URL: https://issues.apache.org/jira/browse/LUCENE-3122 Project: Lucene - Java Issue Type: Improvement Components: modules/grouping Reporter: Michael McCandless Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 Similar to SOLR-2526, in that you are grouping on 2 separate fields, but instead of treating those fields as a single grouping by a compound key, this change would let you first group on key1 for the primary groups and then secondarily on key2 within the primary groups. Ie, the result you get back would have groups A, B, C (grouped by key1) but then the documents within group A would be grouped by key 2. I think this will be important for apps whose documents are the product of denormalizing, ie where the Lucene document is really a sub-document of a different identifier field. Borrowing an example from LUCENE-3097, you have doctors but each doctor may have multiple offices (addresses) where they practice and so you index doctor X address as your lucene documents. In this case, your identifier field (that which counts for facets, and should be grouped for presentation) is doctorid. When you offer users search over this index, you'd likely want to 1) group by distance (ie, 0.1 miles, 0.2 miles, etc., as a function query), but 2) also group by doctorid, ie cascaded grouping. I suspect this would be easier to implement than it sounds: the per-group collector used by the 2nd pass grouping collector for key1's grouping just needs to be another grouping collector. Spookily, though, that collection would also have to be 2-pass, so it could get tricky since grouping is sort of recursing on itself once we have LUCENE-3112, though, that should enable efficient single pass grouping by the identifier (doctorid). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3069: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Java Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3013) I wish Lucene query explanations were easier to localise
[ https://issues.apache.org/jira/browse/LUCENE-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3013: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) I wish Lucene query explanations were easier to localise Key: LUCENE-3013 URL: https://issues.apache.org/jira/browse/LUCENE-3013 Project: Lucene - Java Issue Type: Wish Components: core/query/scoring Reporter: Trejkaz Labels: gsoc2012, lucene-gsoc-12 Often users ask us to provide a nice UI to explain why a document matched their query. Currently the strings output by Explanation are very advanced, and probably only understandable to those who have worked on Lucene. I took a shot at trying to make them friendlier, but it basically came down to parsing the strings it output and trying to figure out what kind of query was at each point (the inability to get to a Query from the Explanation is a small part of the problem here), formulating the result into readable English. In the end it seems a bit too hard. The solution to this could be done in at least two ways: 1. Add getLocalizedSummary() / getLocalizedDescription() method(s) and use resource bundles internally. Projects wishing to localise these could add their own resource bundles to the classpath and/or get them contributed to Lucene. 2. Add subclasses of Explanation with enough methods for callers to interrogate the individual details of the explanation instead of outputting it as a monolithic string. I do like the tree structure of explanations a lot (as it resembles the query tree), I just think there is work to be done splitting up the strings into usable fragments of information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2948) Make var gap terms index a partial prefix trie
[ https://issues.apache.org/jira/browse/LUCENE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2948. Resolution: Won't Fix I think BlockTree terms dict accomplished the same thing. Make var gap terms index a partial prefix trie -- Key: LUCENE-2948 URL: https://issues.apache.org/jira/browse/LUCENE-2948 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2948.patch, LUCENE-2948.patch, LUCENE-2948.patch, LUCENE-2948_automaton.patch, Results.png Var gap stores (in an FST) the indexed terms (every 32nd term, by default), minus their non-distinguishing suffixes. However, often times the resulting FST is close to a prefix trie in some portion of the terms space. By allowing some nodes of the FST to store all outgoing edges, including ones that do not lead to an indexed term, and by recording that this node is then authoritative as to what terms exist in the terms dict from that prefix, we can get some important benefits: * It becomes possible to know that a certain term prefix cannot exist in the terms index, which means we can save a disk seek in some cases (like PK lookup, docFreq, etc.) * We can query for the next possible prefix in the index, allowing some MTQs (eg FuzzyQuery) to save disk seeks. Basically, the terms index is able to answer questions that previously required seeking/scanning in the terms dict file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2929) all postings enums must explicitly declare what they need up-front.
[ https://issues.apache.org/jira/browse/LUCENE-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2929: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) Still need requiresPayloads boolean. all postings enums must explicitly declare what they need up-front. --- Key: LUCENE-2929 URL: https://issues.apache.org/jira/browse/LUCENE-2929 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 Attachments: LUCENE-2929.patch, LUCENE-2929.patch, LUCENE-2929.patch Currently, the DocsEnum api assumes you *might* consumes freqs at any time. Additionally the DocsAndPositionsEnum api assumes you *might* consume a payload at any time. High level things such as queries know what kinds of data they need from the index up-front, and the current APIs are limiting to codecs (other than Standard, which has these intertwined). So, we either need DocsAndFreqsEnum, DocsPositionsAndPayloadsEnum, or at least booleans in the methods that create these to specify whether you want freqs or payloads. we did this for freqs in the bulkpostings API, which is good, but these DocsEnum apis are also new in 4.0 and there's no reason to introduce non-performant APIs. additionally when/if we add payloads to the bulkpostings API, we should make sure we keep the same trend and require you to specify you want payloads or not up-front. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2530) rename docsEnum.getBulkResult() to make its role clearer
[ https://issues.apache.org/jira/browse/LUCENE-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2530. Resolution: Won't Fix We removed bulk API in 4.0. rename docsEnum.getBulkResult() to make its role clearer Key: LUCENE-2530 URL: https://issues.apache.org/jira/browse/LUCENE-2530 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 4.0 Reporter: Andi Vajda Assignee: Michael McCandless Priority: Minor Fix For: 4.0 Before docsEnum.read() can be called a BulkResult instance must be allocated for it (it == the default implementation of that method). This is done by calling docsEnum.getBulkResult(). Failure to call this method before read() is called results in a NullPointerException. It is somewhat counterintuitive to get the results of an operation before calling said operation. Maybe this method should be renamed to something more definite-sounding like obtainBulkResult() or prepareBulkResult() ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2505) The system cannot find the file specified - _0.fdt
[ https://issues.apache.org/jira/browse/LUCENE-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2505. Resolution: Incomplete The system cannot find the file specified - _0.fdt -- Key: LUCENE-2505 URL: https://issues.apache.org/jira/browse/LUCENE-2505 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 2.4.1 Reporter: Tej Kiran Sharma Hi, I am using Lucene version 2.4.1 and while i indexing my files i got following exception. i set indexwriter as following.. Directory lucDirectory = FSDirectory.getDirectory(_sIndexPath); lucDirectory.setLockFactory(new SimpleFSLockFactory(_sIndexPath)); lucWriter = new IndexWriter(lucDirectory, true, new KeywordAnalyzer(), true); lucWriter.setMergeFactor(10); lucWriter.setMaxMergeDocs(2147483647); lucWriter.setMaxBufferedDocs(1); lucWriter.setRAMBufferSizeMB(32); lucWriter.setUseCompoundFile(false); I am doing indexing and searching both symultaniously and i am getting following exception the system cannot find the file specified ERROR Exception while checking size - C:\00scripts\Temp\TempIndex\20104261030775\_0.fdt (The system cannot find the file specified)Stacktrace java.io.FileNotFoundException: C:\00scripts\Temp\TempIndex\20104261030775\_0.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(Unknown Source) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(Unknown Source) at org.apache.lucene.store.FSDirectory$FSIndexInput.init(Unknown Source) at org.apache.lucene.store.FSDirectory.openInput(Unknown Source) at org.apache.lucene.index.FieldsReader.init(Unknown Source) at org.apache.lucene.index.SegmentReader.initialize(Unknown Source) at org.apache.lucene.index.SegmentReader.get(Unknown Source)at org.apache.lucene.index.SegmentReader.get(Unknown Source)at org.apache.lucene.index.DirectoryIndexReader$1.doBody(Unknown Source) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(Unknown Source) at org.apache.lucene.index.DirectoryIndexReader.open(Unknown Source)at org.apache.lucene.index.IndexReader.open(Unknown Source) at org.apache.lucene.index.IndexReader.open(Unknown Source) at org.apache.lucene.search.IndexSearcher.init(Unknown Source)at com..main.apu.d(Unknown Source) at com..main.apu.a(Unknown Source) at com.main.arn.a(Unknown Source) at com.main.abh.b(Unknown Source) at com.main.abh.a(Unknown Source) at com..main.abh.f(Unknown Source) at com.main.eu.run(Unknown Source) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2441) Create 3.x - 4.0 index migration tool
[ https://issues.apache.org/jira/browse/LUCENE-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2441. Resolution: Duplicate We already have IndexUpgrader now. Create 3.x - 4.0 index migration tool -- Key: LUCENE-2441 URL: https://issues.apache.org/jira/browse/LUCENE-2441 Project: Lucene - Java Issue Type: New Feature Components: core/index Reporter: Michael McCandless Fix For: 4.0 We need a tool to upgrade an index so that 4.0 can read it. I think the only change right now is the cutover to flex's standard codec format, but with LUCENE-2426 we also need to correct the term sort order to be true unicode code point order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2445) Perf improvements for the DocsEnum bulk read API
[ https://issues.apache.org/jira/browse/LUCENE-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2445. Resolution: Won't Fix We removed bulk API in 4.0. Perf improvements for the DocsEnum bulk read API Key: LUCENE-2445 URL: https://issues.apache.org/jira/browse/LUCENE-2445 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Michael McCandless Fix For: 4.0 I started to work on LUCENE-2443, to create a test showing the problems, but it turns out none of the core codecs (even sep/intblock) ever set a non-zero offset. So I set forth to fix sep to do so, but ran into some issues w/ the current bulk-read API that we should fix to make it higher performance: * Filtering of deleted docs should be the caller's job (saves an extra pass through the docs) * Probably docs should arrive as deltas and caller sums these up to get the actual docID * Whether to load freqs or not should be separately controllable * We may want to require that the int[] for docs and freqs are aligned, ie the offset into each is the same * Maybe we should separate out a BulkDocsEnum from DocsEnum. We can make it optional for codecs (ie, we can emulate BulkDocsEnum from the DocsEnum) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2364) Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co.
[ https://issues.apache.org/jira/browse/LUCENE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2364. Resolution: Fixed Term now stores BytesRef internally... Add support for terms in BytesRef format to Term, TermQuery, TermRangeQuery Co. - Key: LUCENE-2364 URL: https://issues.apache.org/jira/browse/LUCENE-2364 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Michael McCandless Fix For: 4.0 It would be good to directly allow BytesRefs in TermQuery and TermRangeQuery (as both queries convert the strings to BytesRef internally). For NumericRange support in Solr it will be needed to support numerics as ByteRef in single-term queries. When this will be added, don't forget to change TestNumericRangeQueryXX to use the BytesRef ctor of TRQ. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping
[ https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2357: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping Key: LUCENE-2357 URL: https://issues.apache.org/jira/browse/LUCENE-2357 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Priority: Minor Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 We allocate this int[] to remap docIDs due to compaction of deleted ones. This uses alot of RAM for large segment merges, and can fail to allocate due to fragmentation on 32 bit JREs. Now that we have packed ints, a simple fix would be to use a packed int array... and maybe instead of storing abs docID in the mapping, we could store the number of del docs seen so far (so the remap would do a lookup then a subtract). This may add some CPU cost to merging but should bring down transient RAM usage quite a bit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2334) IndexReader.close() should call IndexReader.decRef() unconditionally ??
[ https://issues.apache.org/jira/browse/LUCENE-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2334. Resolution: Won't Fix IndexReader.close() should call IndexReader.decRef() unconditionally ?? --- Key: LUCENE-2334 URL: https://issues.apache.org/jira/browse/LUCENE-2334 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.0.1 Reporter: Mike Hanafey Priority: Minor IndexReader.close() is defined: {code} /** * Closes files associated with this index. * Also saves any new deletions to disk. * No other methods should be called after this has been called. * @throws IOException if there is a low-level IO error */ public final synchronized void close() throws IOException { if (!closed) { decRef(); closed = true; } } {code} This means that if the refCount is bigger than one, close() does not actually close, but it is also true that calling close() again has no effect. Why does close() not simply call decRef() unconditionally? This way if incRef() is called each time an instance of IndexReader were handed out, if close() is called by each recipient when they are done, the last one to call close will actually close the index. As written it seems the API is very confusing -- the first close() does one thing, but the next close() does something different. At a minimum the JavaDoc should clarify the behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity
[ https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2310. Resolution: Fixed Fix Version/s: 4.0 Reduce Fieldable, AbstractField and Field complexity Key: LUCENE-2310 URL: https://issues.apache.org/jira/browse/LUCENE-2310 Project: Lucene - Java Issue Type: Sub-task Components: core/index Reporter: Chris Male Fix For: 4.0 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-AbstractField.patch, LUCENE-2310-Deprecate-DocumentGetFields-core.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310-Deprecate-DocumentGetFields.patch, LUCENE-2310.patch In order to move field type like functionality into its own class, we really need to try to tackle the hierarchy of Fieldable, AbstractField and Field. Currently AbstractField depends on Field, and does not provide much more functionality that storing fields, most of which are being moved over to FieldType. Therefore it seems ideal to try to deprecate AbstractField (and possible Fieldable), moving much of the functionality into Field and FieldType. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2338) Some tests catch Exceptions in separate threads and just print a stack trace - the test does not fail
[ https://issues.apache.org/jira/browse/LUCENE-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233498#comment-13233498 ] Uwe Schindler commented on LUCENE-2338: --- Were all tests already converted to not supress exceptions in threads? This is why the issue is still open... Some tests catch Exceptions in separate threads and just print a stack trace - the test does not fail - Key: LUCENE-2338 URL: https://issues.apache.org/jira/browse/LUCENE-2338 Project: Lucene - Java Issue Type: Test Components: general/build Reporter: Uwe Schindler Fix For: 3.6, 4.0 Some tests catch Exceptions in separate threads and just print a stack trace - the test does not fail. The test should fail. Since LUCENE-2274, the LuceneTestCase(J4) class installs an UncaughtExceptionHandler, so this type of catching and solely printing a Stack trace is a bad idea. Problem is, that the run() method of threads is not allowed to throw checked Exceptions. Two possibilities: - Catch checked Exceptions in the run() method and wrap into RuntimeException or call Assert.fail() instead - Use Executors -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2276) Add IndexReader.document(int, Document, FieldSelector)
[ https://issues.apache.org/jira/browse/LUCENE-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2276. Resolution: Duplicate Fix Version/s: 4.0 The StoredFieldVisitor API (4.0) makes this possible... Add IndexReader.document(int, Document, FieldSelector) -- Key: LUCENE-2276 URL: https://issues.apache.org/jira/browse/LUCENE-2276 Project: Lucene - Java Issue Type: Wish Components: core/search Reporter: Tim Smith Fix For: 4.0 Attachments: LUCENE-2276+2539.patch, LUCENE-2276.patch The Document object passed in would be populated with the fields identified by the FieldSelector for the specified internal document id This method would allow reuse of Document objects when retrieving stored fields from the index -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2120) Possible file handle leak in near real-time reader
[ https://issues.apache.org/jira/browse/LUCENE-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2120. Resolution: Cannot Reproduce Possible file handle leak in near real-time reader -- Key: LUCENE-2120 URL: https://issues.apache.org/jira/browse/LUCENE-2120 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.1 Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Spinoff of LUCENE-1526: Jake/John hit file descriptor exhaustion when testing NRT. I've tried to repro this, stress testing NRT, saturating reopens, indexing, searching, but haven't found any issue. Let's try to get to the bottom of it, here... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2082) Performance improvement for merging posting lists
[ https://issues.apache.org/jira/browse/LUCENE-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2082: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) Performance improvement for merging posting lists - Key: LUCENE-2082 URL: https://issues.apache.org/jira/browse/LUCENE-2082 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael Busch Priority: Minor Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 A while ago I had an idea about how to improve the merge performance for posting lists. This is currently by far the most expensive part of segment merging due to all the VInt de-/encoding. Not sure if an idea for improving this was already mentioned in the past? So the basic idea is it to perform a raw copy of as much posting data as possible. The reason why this is difficult is that we have to remove deleted documents. But often the fraction of deleted docs in a segment is rather low (10%?), so it's likely that there are quite long consecutive sections without any deletions. To find these sections we could use the skip lists. Basically at any point during the merge we would find the skip entry before the next deleted doc. All entries to this point can be copied without de-/encoding of the VInts. Then for the section that has deleted docs we perform the normal way of merging to remove the deletes. Then we check again with the skip lists if we can raw copy the next section. To make this work there are a few different necessary changes: 1) Currently the multilevel skiplist reader/writer can only deal with fixed-size skips (16 on the lowest level). It would be an easy change to allow variable-size skips, but then the MultiLevelSkipListReader can't return numSkippedDocs anymore, which SegmentTermDocs needs - change 2) 2) Store the last docID in which a term occurred in the term dictionary. This would also be beneficial for other use cases. By doing that the SegmentTermDocs#next(), #read() and #skipTo() know when the end of the postinglist is reached. Currently they have to track the df, which is why after a skip it's important to take the numSkippedDocs into account. 3) Change the merging algorithm according to my description above. It's important to create a new skiplist entry at the beginning of every block that is copied in raw mode, because its next skip entry's values are deltas from the beginning of the block. Also the very first posting, and that one only, needs to be decoded/encoded to make sure that the payload length is explicitly written (i.e. must not depend on the previous length). Also such a skip entry has to be created at the beginning of each source segment's posting list. With change 2) we don't have to worry about the positions of the skip entries. And having a few extra skip entries in merged segments won't hurt much. If a segment has no deletions at all this will avoid any decoding/encoding of VInts (best case). I think it will also work great for segments with a rather low amount of deletions. We should probably then have a threshold: if the number of deletes exceeds this threshold we should fall back to old style merging. I haven't implemented any of this, so there might be complications I haven't thought about. Please let me know if you can think of reasons why this wouldn't work or if you think more changes are necessary. I will probably not have time to work on this soon, but I wanted to open this issue to not forget about it :). Anyone should feel free to take this! Btw: I think the flex-indexing branch would be a great place to try this out as a new codec. This would also be good to figure out what APIs are needed to make merging fully flexible as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1948) Deprecating InstantiatedIndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1948. Resolution: Fixed Deprecating InstantiatedIndexWriter --- Key: LUCENE-1948 URL: https://issues.apache.org/jira/browse/LUCENE-1948 Project: Lucene - Java Issue Type: Task Components: modules/other Affects Versions: 2.9 Reporter: Karl Wettin Assignee: Karl Wettin Fix For: 4.0 Attachments: LUCENE-1948.patch http://markmail.org/message/j6ip266fpzuaibf7 I suppose that should have been suggested before 2.9 rather than after... There are at least three reasons to why I want to do this: The code is based on the behaviour or the Directory IndexWriter as of 2.3 and I have not been touching it since then. If there will be changes in the future one will have to keep IIW in sync, something that's easy to forget. There is no locking which will cause concurrent modification exceptions when accessing the index via searcher/reader while committing. It use the old token stream API so it has to be upgraded in case it should stay. The java- and package level docs have since it was committed been suggesting that one should consider using II as if it was immutable due to the locklessness. My suggestion is that we make it immutable for real. Since II is ment for small corpora there is very little time lost by using the constructor that builts the index from an IndexReader. I.e. rather than using InstantiatedIndexWriter one would have to use a Directory and an IndexWriter and then pass an IndexReader to a new InstantiatedIndex. Any objections? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1922) exposing the ability to get the number of unique term count per field
[ https://issues.apache.org/jira/browse/LUCENE-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1922. Resolution: Duplicate Fix Version/s: 2.9 Fixed in LUCENE-1586. exposing the ability to get the number of unique term count per field - Key: LUCENE-1922 URL: https://issues.apache.org/jira/browse/LUCENE-1922 Project: Lucene - Java Issue Type: New Feature Components: core/index Affects Versions: 4.0 Reporter: John Wang Fix For: 4.0, 2.9 Add an api to get the number of unique term count given a field name, e.g.: IndexReader.getUniqueTermCount(String field) This issue has a dependency on LUCENE-1458 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1761) low level Field metadata is never removed from index
[ https://issues.apache.org/jira/browse/LUCENE-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1761: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) low level Field metadata is never removed from index Key: LUCENE-1761 URL: https://issues.apache.org/jira/browse/LUCENE-1761 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1 Reporter: Hoss Man Priority: Minor Labels: gsoc2012, lucene-gsoc-12 Attachments: LUCENE-1761.patch with heterogeneous docs, or an index whose fields evolve over time, field names that are no longer used (ie: all docs that ever referenced them have been deleted) still show up when you use IndexReader.getFieldNames. It seems logical that segment merging should only preserve metadata about fields that actually existing the new segment, but even after deleting all documents from an index and optimizing the old field names are still present. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-1750) Create a MergePolicy that limits the maximum size of it's segments
[ https://issues.apache.org/jira/browse/LUCENE-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1750. Resolution: Duplicate Fix Version/s: 3.2 TieredMergePolicy does this... Create a MergePolicy that limits the maximum size of it's segments -- Key: LUCENE-1750 URL: https://issues.apache.org/jira/browse/LUCENE-1750 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 2.4.1 Reporter: Jason Rutherglen Priority: Minor Fix For: 4.0, 3.2 Attachments: LUCENE-1750.patch Original Estimate: 48h Remaining Estimate: 48h Basically I'm trying to create largish 2-4GB shards using LogByteSizeMergePolicy, however I've found in the attached unit test segments that exceed maxMergeMB. The goal is for segments to be merged up to 2GB, then all merging to that segment stops, and then another 2GB segment is created. This helps when replicating in Solr where if a single optimized 60GB segment is created, the machine stops working due to IO and CPU starvation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1252) Avoid using positions when not all required terms are present
[ https://issues.apache.org/jira/browse/LUCENE-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1252: --- Labels: gsoc2012 lucene-gsoc-12 (was: ) Avoid using positions when not all required terms are present - Key: LUCENE-1252 URL: https://issues.apache.org/jira/browse/LUCENE-1252 Project: Lucene - Java Issue Type: Wish Components: core/search Reporter: Paul Elschot Priority: Minor Labels: gsoc2012, lucene-gsoc-12 In the Scorers of queries with (lots of) Phrases and/or (nested) Spans, currently next() and skipTo() will use position information even when other parts of the query cannot match because some required terms are not present. This could be avoided by adding some methods to Scorer that relax the postcondition of next() and skipTo() to something like all required terms are present, but no position info was checked yet, and implementing these methods for Scorers that do conjunctions: BooleanScorer, PhraseScorer, and SpanScorer/NearSpans. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1000) queryparsersyntax.html escaping section needs beefed up
[ https://issues.apache.org/jira/browse/LUCENE-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1000: --- Labels: newdev (was: ) queryparsersyntax.html escaping section needs beefed up --- Key: LUCENE-1000 URL: https://issues.apache.org/jira/browse/LUCENE-1000 Project: Lucene - Java Issue Type: Improvement Components: general/website Reporter: Hoss Man Labels: newdev Fix For: 4.0 the query syntax documentation is currently lacking several key pieces of info: 1) that unicode style escapes are valid 2) that any character can be escaped with a backslash, not just special chars. ..we should probably beef up the Escaping Special Characters section -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3260) Improve exception handling / logging for ScriptTranformer.init()
Improve exception handling / logging for ScriptTranformer.init() Key: SOLR-3260 URL: https://issues.apache.org/jira/browse/SOLR-3260 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.5, 4.0 Reporter: James Dyer Assignee: James Dyer Priority: Trivial Fix For: 3.6, 4.0 This came up on the user-list. ScriptTransformer logs the same need a =1.6 jre message for several problems, making debugging difficult for users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org