ImportError: cannot import name Library, during installing PyLucene
Hi everyone! I have spent 5 hours to fix this problem but I can't. During installing PyLucene withhttp://lucene.apache.org/pylucene/install.htmt , I faced with a error like follwing. sanghee-m:jcc sanghee$ python setup.py build found JAVAFRAMEWORKS = /System/Library/Frameworks/JavaVM.frameworkTraceback (most recent call last): File setup.py, line 398, in module main('--debug' in sys.argv) File setup.py, line 306, in main from setuptools import LibraryImportError: cannot import name Library sanghee-m:jcc sanghee$ python setup.py build --debug found JAVAFRAMEWORKS = /System/Library/Frameworks/JavaVM.frameworkTraceback (most recent call last): File setup.py, line 398, in module main('--debug' in sys.argv) File setup.py, line 306, in main from setuptools import LibraryImportError: cannot import name Library sanghee-m:jcc sanghee$ I can't find Library also. How can I solve this problme? Can you let me know where should I check when I get this kind of error? I uses setuptools 1.1.6 and pylucene-4.4.0-1. -- SangHee Kim http://goo.gl/LnpDX
Re: [VOTE] Release PyLucene 4.5.0-1
On Fri, 11 Oct 2013, Steve Rowe wrote: I really have no idea where to start looking to figure out what's happening - I'm not a big python user - any ideas? Would it be useful to package up my make'd directory and send it to you? I don't know yet. Do you know which version of setuptools you have installed ? I'm currently battling an issue with the third generation setuptools, v 1.1.6. If you don't have something like 0.6something or 0.7 installed, please try that (for lack of any better ideas, sorry). Andi.. Steve On Oct 10, 2013, at 7:34 PM, Andi Vajda va...@apache.org wrote: On Thu, 10 Oct 2013, Steve Rowe wrote: Meant to send to the mailing lists: Begin forwarded message: From: Steve Rowe sar...@gmail.com Subject: Re: [VOTE] Release PyLucene 4.5.0-1 Date: October 10, 2013 3:18:50 AM EDT To: Andi Vajda va...@apache.org Andi, I thought I'd run 'make' and 'sudo make install' in two steps, so I checked, and bash 'history' agreed: 586 vi Makefile 587 make 588 sudo make install I tried again, first rm -rf'ing the unpacked distribution, then unpacking, (skipping the jcc 'make' and 'sudo make install' this time), editing the Makefile, then running 'make', then 'sudo make install', and I got the same error - I suppose this is the most salient line: The problem is that I can't even reproduce the error. You're not the first one to report it but it usually goes away :-( Stuck. Andi.. - No local packages or download links found for lucene==4.5.0 - Steve On Oct 10, 2013, at 2:59 AM, Andi Vajda va...@apache.org wrote: Hi Steve, On Thu, 10 Oct 2013, Steve Rowe wrote: After make'ing and installing jcc (no setup.py changes required); uncommenting the first Mac OS X 10.6 section in Makefile (I have OS X 10.8.5, with stock Python 2.7.2 and Oracle Java 1.7.0_25); and finally make'ing pylucene: 'sudo make install' fails - here's the tail end of the output: - writing build/bdist.macosx-10.8-x86_64/egg/EGG-INFO/native_libs.txt creating dist creating 'dist/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg' and adding 'build/bdist.macosx-10.8-x86_64/egg' to it removing 'build/bdist.macosx-10.8-x86_64/egg' (and everything under it) Processing lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg creating /Library/Python/2.7/site-packages/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg Extracting lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg to /Library/Python/2.7/site-packages Removing lucene 4.4.0 from easy-install.pth file Adding lucene 4.5.0 to easy-install.pth file Installed /Library/Python/2.7/site-packages/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg Processing dependencies for lucene==4.5.0 Searching for lucene==4.5.0 Reading http://pypi.python.org/simple/lucene/ Couldn't find index page for 'lucene' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading http://pypi.python.org/simple/ No local packages or download links found for lucene==4.5.0 error: Could not find suitable distribution for Requirement.parse('lucene==4.5.0') This error has been a problem for a while. You need to make, then make install, in two steps. Otherwise, when 'make install' in pylucene from clean, this error seems to happen. I don't know of a fix. Andi.. make: *** [install] Error 1 - I've included the entire 'sudo make install' output here: https://paste.apache.org/8gAF Steve On Oct 8, 2013, at 1:00 AM, Andi Vajda va...@apache.org wrote: The PyLucene 4.5.0-1 release tracking the recent release of Apache Lucene 4.5.0 is ready. A release candidate is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_5/CHANGES PyLucene 4.5.0 is built with JCC 2.17 included in these release artifacts: http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_5_0/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 4.5.0-1. Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
Re: [VOTE] Release PyLucene 4.5.0-1
Andi, I really have no idea where to start looking to figure out what's happening - I'm not a big python user - any ideas? Would it be useful to package up my make'd directory and send it to you? Steve On Oct 10, 2013, at 7:34 PM, Andi Vajda va...@apache.org wrote: On Thu, 10 Oct 2013, Steve Rowe wrote: Meant to send to the mailing lists: Begin forwarded message: From: Steve Rowe sar...@gmail.com Subject: Re: [VOTE] Release PyLucene 4.5.0-1 Date: October 10, 2013 3:18:50 AM EDT To: Andi Vajda va...@apache.org Andi, I thought I'd run 'make' and 'sudo make install' in two steps, so I checked, and bash 'history' agreed: 586 vi Makefile 587 make 588 sudo make install I tried again, first rm -rf'ing the unpacked distribution, then unpacking, (skipping the jcc 'make' and 'sudo make install' this time), editing the Makefile, then running 'make', then 'sudo make install', and I got the same error - I suppose this is the most salient line: The problem is that I can't even reproduce the error. You're not the first one to report it but it usually goes away :-( Stuck. Andi.. - No local packages or download links found for lucene==4.5.0 - Steve On Oct 10, 2013, at 2:59 AM, Andi Vajda va...@apache.org wrote: Hi Steve, On Thu, 10 Oct 2013, Steve Rowe wrote: After make'ing and installing jcc (no setup.py changes required); uncommenting the first Mac OS X 10.6 section in Makefile (I have OS X 10.8.5, with stock Python 2.7.2 and Oracle Java 1.7.0_25); and finally make'ing pylucene: 'sudo make install' fails - here's the tail end of the output: - writing build/bdist.macosx-10.8-x86_64/egg/EGG-INFO/native_libs.txt creating dist creating 'dist/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg' and adding 'build/bdist.macosx-10.8-x86_64/egg' to it removing 'build/bdist.macosx-10.8-x86_64/egg' (and everything under it) Processing lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg creating /Library/Python/2.7/site-packages/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg Extracting lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg to /Library/Python/2.7/site-packages Removing lucene 4.4.0 from easy-install.pth file Adding lucene 4.5.0 to easy-install.pth file Installed /Library/Python/2.7/site-packages/lucene-4.5.0-py2.7-macosx-10.8-x86_64.egg Processing dependencies for lucene==4.5.0 Searching for lucene==4.5.0 Reading http://pypi.python.org/simple/lucene/ Couldn't find index page for 'lucene' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading http://pypi.python.org/simple/ No local packages or download links found for lucene==4.5.0 error: Could not find suitable distribution for Requirement.parse('lucene==4.5.0') This error has been a problem for a while. You need to make, then make install, in two steps. Otherwise, when 'make install' in pylucene from clean, this error seems to happen. I don't know of a fix. Andi.. make: *** [install] Error 1 - I've included the entire 'sudo make install' output here: https://paste.apache.org/8gAF Steve On Oct 8, 2013, at 1:00 AM, Andi Vajda va...@apache.org wrote: The PyLucene 4.5.0-1 release tracking the recent release of Apache Lucene 4.5.0 is ready. A release candidate is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_5/CHANGES PyLucene 4.5.0 is built with JCC 2.17 included in these release artifacts: http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_5_0/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 4.5.0-1. Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
Re: ImportError: cannot import name Library, during installing PyLucene
On Fri, 11 Oct 2013, SangHee Kim wrote: Hi everyone! I have spent 5 hours to fix this problem but I can't. During installing PyLucene withhttp://lucene.apache.org/pylucene/install.htmt , I faced with a error like follwing. sanghee-m:jcc sanghee$ python setup.py build found JAVAFRAMEWORKS = /System/Library/Frameworks/JavaVM.frameworkTraceback (most recent call last): File setup.py, line 398, in module main('--debug' in sys.argv) File setup.py, line 306, in main from setuptools import LibraryImportError: cannot import name Library Indeed, there are two problems with setuptools 1.1.6, apparently: 1. the Library class is only accessible via setuptools.extension 2. the logic in setuptools.command.build_ext patching in darwin-specific options for building a shared library into _CONFIG_VARS is broken I added code to JCC's setup.py to workaround both issues. This is checked into rev 1531420 in pylucene's trunk. Please, refresh your copy of JCC from pylucene's trunk (still called 2.17), rebuild and reinstall it and try your pylucene 4.4.0 build again. Please, let me know if this solves the problem for you as well. Andi..
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792375#comment-13792375 ] ASF subversion and git services commented on LUCENE-5269: - Commit 1531202 from [~rcmuir] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531202 ] LUCENE-5269: Fix NGramTokenFilter length filtering TestRandomChains failure Key: LUCENE-5269 URL: https://issues.apache.org/jira/browse/LUCENE-5269 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5.1, 4.6, 5.0 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5269. - Resolution: Fixed TestRandomChains failure Key: LUCENE-5269 URL: https://issues.apache.org/jira/browse/LUCENE-5269 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5.1, 4.6, 5.0 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5277) Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset
[ https://issues.apache.org/jira/browse/LUCENE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792396#comment-13792396 ] Uwe Schindler commented on LUCENE-5277: --- Is there any issue that will use the new ctor? As the current ctor is unused why not simply remove it and leave adding the new one to an issue that really needs it? Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset --- Key: LUCENE-5277 URL: https://issues.apache.org/jira/browse/LUCENE-5277 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5277.patch FixedBitSet copy constructor is redundant the way it is now -- one can call FBS.clone() to achieve that (and indeed, no code in Lucene calls this ctor). I think it will be useful to add a numBits parameter to that method to allow growing/shrinking the new bitset, while copying all relevant bits from the passed one. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792402#comment-13792402 ] Uwe Schindler commented on LUCENE-5269: --- This is so crazy! Why did we never hit this combination before? Thanks for fixing, although I see the CodePointLengthFilter not really as a bug fix, it is more a new feature! Maybe explicitely add this as new feature to changes.txt? TestRandomChains failure Key: LUCENE-5269 URL: https://issues.apache.org/jira/browse/LUCENE-5269 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5.1, 4.6, 5.0 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792424#comment-13792424 ] Robert Muir commented on LUCENE-5269: - I didnt want new features mixed with bugfixes really :( But in my opinion this was the simplest way to solve the problem: to just add a filter like this and for it to use that instead of LengthFilter. I think it would be wierd to see new features in a 4.5.1? TestRandomChains failure Key: LUCENE-5269 URL: https://issues.apache.org/jira/browse/LUCENE-5269 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5.1, 4.6, 5.0 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792429#comment-13792429 ] Robert Muir commented on LUCENE-5269: - {quote} This is so crazy! Why did we never hit this combination before? {quote} This combination is especially good at finding the bug, here's why: {code} Tokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, reader, 2, 94); TokenStream stream = new ShingleFilter(tokenizer, 5); stream = new NGramTokenFilter(TEST_VERSION_CURRENT, stream, 55, 83); {code} The edge-ngram has min=2 max=94, its basically brute forcing every token size. then the shingles makes tons of tokens with positionIncrement=0. so it makes it easy for the (previously buggy ngramtokenfilter with wrong length filter) to misclassify tokens with its logic expecting codepoints, emit an initial token with posinc=0: {code} if ((curPos + curGramSize) = curCodePointCount) { ... posIncAtt.setPositionIncrement(curPosInc); {code} TestRandomChains failure Key: LUCENE-5269 URL: https://issues.apache.org/jira/browse/LUCENE-5269 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5.1, 4.6, 5.0 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5338) Split shards by a route key
[ https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5338: Description: Provide a way to split a shard using a route key such that all documents of the specified route key end up in a single dedicated sub-shard. Example: Assume that collection1, shard1 has hash range [0, 20]. Also that route key 'A!' has hash range [12,15]. Then invoking: {code} /admin/collections?action=SPLITcollection=collection1split.key=A! {code} should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. Specifying the source shard is not required here because the route key is enough to figure it out. Route keys spanning more than one shards will not be supported. Note that the sub-shard with the hash range of the route key may also contain documents for other route keys whose hashes collide. was: Provide a way to split a shard using a route key such that all documents of the specified route key end up in a single dedicated sub-shard. Example: Assume that collection1, shard1 has hash range [0, 20]. Also that route key 'A!' has hash range [12,15]. Then invoking: {code} /admin/collections?action=SPLITcollection=collection1split.key=A! {code} should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. Then the sub-shard dedicated to documents for route key 'A!' can be scaled separately. Specifying the source shard is not required here because the route key is enough to figure it out. Split shards by a route key --- Key: SOLR-5338 URL: https://issues.apache.org/jira/browse/SOLR-5338 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.6, 5.0 Provide a way to split a shard using a route key such that all documents of the specified route key end up in a single dedicated sub-shard. Example: Assume that collection1, shard1 has hash range [0, 20]. Also that route key 'A!' has hash range [12,15]. Then invoking: {code} /admin/collections?action=SPLITcollection=collection1split.key=A! {code} should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. Specifying the source shard is not required here because the route key is enough to figure it out. Route keys spanning more than one shards will not be supported. Note that the sub-shard with the hash range of the route key may also contain documents for other route keys whose hashes collide. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica
[ https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5310: - Attachment: SOLR-5310.patch Add a collection admin command to remove a replica -- Key: SOLR-5310 URL: https://issues.apache.org/jira/browse/SOLR-5310 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5310.patch Original Estimate: 72h Remaining Estimate: 72h the only way a replica can removed is by unloading the core .There is no way to remove a replica that is down . So, the clusterstate will have unreferenced nodes if a few nodes go down over time We need a cluster admin command to clean that up e.g: /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3 The system would first see if the replica is active. If yes , a core UNLOAD command is fired , which would take care of deleting the replica from the clusterstate as well if the state is inactive, then the core or node may be down , in that case the entry is removed from cluster state -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator
[ https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5260: - Attachment: LUCENE-5260.patch Uploaded Patch: - changed the input to lookup.build to take TermFreqPayloadIterator instead of TermFreqPayloadIterator - made all suggesters compatible with termFreqPayloadIterator (but error if payload is present but cannot be used) - nuked all implementations of TermFreq and made them work with termFreqPayload instead (Except for SortedTermFreqIteratorWrapper). - got rid of all the references to termFreqIter Still todo: - actually nuke TermFreqIterator - change the names of the implementations to reflect that they are implementations of TermFreqPayloadIter - add tests to ensure that all the implementations work with payload - support payloads in SortedTermFreqIteratorWrapper Make older Suggesters more accepting of TermFreqPayloadIterator --- Key: LUCENE-5260 URL: https://issues.apache.org/jira/browse/LUCENE-5260 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Areek Zillur Attachments: LUCENE-5260.patch As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would be nice to make the older suggesters accepting of TermFreqPayloadIterator and throw an exception if payload is found (if it cannot be used). This will also allow us to nuke most of the other interfaces for BytesRefIterator. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792461#comment-13792461 ] Uwe Schindler commented on LUCENE-5269: --- bq. I didnt want new features mixed with bugfixes really I agree! But now we have the new feature, so I just asked to add this as a separate entry in CHANGES.txt under New features, just the new filter nothing more. TestRandomChains failure Key: LUCENE-5269 URL: https://issues.apache.org/jira/browse/LUCENE-5269 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5.1, 4.6, 5.0 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5290) Warming up using search logs.
[ https://issues.apache.org/jira/browse/SOLR-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792471#comment-13792471 ] Minoru Osuka commented on SOLR-5290: The patch includes test code. Warming up using search logs. - Key: SOLR-5290 URL: https://issues.apache.org/jira/browse/SOLR-5290 Project: Solr Issue Type: Wish Components: search Affects Versions: 4.4 Reporter: Minoru Osuka Priority: Minor Attachments: SOLR-5290.patch It is possible to warm up of cache automatically in newSearcher event, but it is impossible to warm up of cache automatically in firstSearcher event because there isn't old searcher. We describe queries in solrconfig.xml if we required to cache in firstSearcher event like this: {code:xml} listener event=firstSearcher class=solr.QuerySenderListener arr name=queries lst str name=qstatic firstSearcher warming in solrconfig.xml/str /lst /arr /listener {code} This setting is very statically. I want to query dynamically in firstSearcher event when restart solr. So I paid my attention to the past search log. I think if there are past search logs, it is possible to warm up of cache automatically in firstSearcher event like an autowarming of the cache in newSearcher event. I had created QueryLogSenderListener which extended QuerySenderListener. Sample definition in solrconfig.xml: - directory : Specify the Solr log directory. (Required) - regex : Describe the regular expression of log. (Required) - encoding : Specify the Solr log encoding. (Default : UTF-8) - count : Specify the number of the log to process. (Default : 100) - paths : Specify the request handler name to process. - exclude_params : Specify the request parameter to except. {code:xml} !-- Warming up using search logs. -- listener event=firstSearcher class=solr.QueryLogSenderListener arr name=queries lst str name=qstatic firstSearcher warming in solrconfig.xml/str /lst /arr str name=directorylogs/str str name=encodingUTF-8/str str name=regex![CDATA[^(?level[\w]+)\s+\-\s+(?timestamp[\d\-\s\.:]+);\s+(?class[\w\.\_\$]+);\s+\[(?core.+)\]\s+webapp=(?webapp.+)\s+path=(?path.+)\s+params=\{(?params.*)\}\s+hits=(?hits\d+)\s+status=(?status\d+)\s+QTime=(?qtime\d+).*]]/str arr name=paths str/select/str /arr int name=count100/int arr name=exclude_params strindent/str str_/str /arr /listener {code} I'd like to propose this feature. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes
dejie Chang created SOLR-5339: - Summary: solr-core-4.4's ip is not right when the os is centos 5.6 sometimes Key: SOLR-5339 URL: https://issues.apache.org/jira/browse/SOLR-5339 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.4 Environment: centos 5.6 Reporter: dejie Chang Priority: Critical when I install the solr-cloud on the centos5.6 . t -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5338) Split shards by a route key
[ https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5338: Attachment: SOLR-5338.patch Changes: * Introduces two new methods in CompositeIdRouter {code} public ListRange partitionRangeByKey(String key, Range range) {code} and {code} public Range routeKeyHashRange(String routeKey) {code} * The collection split action accepts a new parameter 'split.key' * The parent slice is found and its range is partitioned according to split.key * We re-use the logic introduced in SOLR-5300 to do the actual splitting. Split shards by a route key --- Key: SOLR-5338 URL: https://issues.apache.org/jira/browse/SOLR-5338 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.6, 5.0 Attachments: SOLR-5338.patch Provide a way to split a shard using a route key such that all documents of the specified route key end up in a single dedicated sub-shard. Example: Assume that collection1, shard1 has hash range [0, 20]. Also that route key 'A!' has hash range [12,15]. Then invoking: {code} /admin/collections?action=SPLITcollection=collection1split.key=A! {code} should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. Specifying the source shard is not required here because the route key is enough to figure it out. Route keys spanning more than one shards will not be supported. Note that the sub-shard with the hash range of the route key may also contain documents for other route keys whose hashes collide. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes
[ https://issues.apache.org/jira/browse/SOLR-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dejie Chang updated SOLR-5339: -- Description: when I install the solr-cloud on the centos5.6 . it is strange that sometimes ,the ip is not correct which is displayed on the http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual is 192.168.10.54. but on the windows it is right. and i found it is because of hostaddress = InetAddress.getLocalHost().getHostAddress(); in ZkController.java . sometimes the method which get ip is not correct .we should not trust . so i think in linux we should not use this method (was: when I install the solr-cloud on the centos5.6 . t) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes Key: SOLR-5339 URL: https://issues.apache.org/jira/browse/SOLR-5339 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.4 Environment: centos 5.6 Reporter: dejie Chang Priority: Critical when I install the solr-cloud on the centos5.6 . it is strange that sometimes ,the ip is not correct which is displayed on the http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual is 192.168.10.54. but on the windows it is right. and i found it is because of hostaddress = InetAddress.getLocalHost().getHostAddress(); in ZkController.java . sometimes the method which get ip is not correct .we should not trust . so i think in linux we should not use this method -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5320) Multi level compositeId router
[ https://issues.apache.org/jira/browse/SOLR-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792478#comment-13792478 ] Anshum Gupta commented on SOLR-5320: A 3 level composite id routing to begin with is what I think would be good. I'd use 8 bits each from the first 2 components of the key and 16 bits from the last component. Functionally, this should work on similar lines as the current 2-level composite id routing. Multi level compositeId router -- Key: SOLR-5320 URL: https://issues.apache.org/jira/browse/SOLR-5320 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Anshum Gupta Original Estimate: 336h Remaining Estimate: 336h This would enable multi level routing as compared to the 2 level routing available as of now. On the usage bit, here's an example: Document Id: myapp!dummyuser!doc myapp!dummyuser! can be used as the shardkey for searching content for dummyuser. myapp! can be used for searching across all users of myapp. I am looking at either a 3 (or 4) level routing. The 32 bit hash would then comprise of 8X4 components from each part (in case of 4 level). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5308) Split all documents of a route key into another collection
[ https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5308: Attachment: (was: SOLR-5308.patch) Split all documents of a route key into another collection -- Key: SOLR-5308 URL: https://issues.apache.org/jira/browse/SOLR-5308 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.6, 5.0 Enable SolrCloud users to split out a set of documents from a source collection into another collection. This will be useful in multi-tenant environments. This feature will make it possible to split a tenant out of a collection and put them into their own collection which can be scaled separately. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 407 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/407/ 1 tests failed. REGRESSION: org.apache.lucene.index.Test2BPostings.test Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at __randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0) at org.apache.lucene.store.BufferedIndexOutput.init(BufferedIndexOutput.java:50) at org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:365) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280) at org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478) at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44) at org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149) at org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:171) at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:80) at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4408) at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:535) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506) at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:378) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:470) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1523) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1193) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1174) at org.apache.lucene.index.Test2BPostings.test(Test2BPostings.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) Build Log: [...truncated 655 lines...] [junit4] Suite: org.apache.lucene.index.Test2BPostings [junit4] 2 NOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory. [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=Test2BPostings -Dtests.method=test -Dtests.seed=D8D3920C725BF71C -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt -Dtests.locale=en_IN -Dtests.timezone=America/Puerto_Rico -Dtests.file.encoding=US-ASCII [junit4] ERROR408s J0 | Test2BPostings.test [junit4] Throwable #1: java.lang.OutOfMemoryError: Java heap space [junit4]at __randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0) [junit4]at org.apache.lucene.store.BufferedIndexOutput.init(BufferedIndexOutput.java:50) [junit4]at org.apache.lucene.store.FSDirectory$FSIndexOutput.init(FSDirectory.java:365) [junit4]at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280) [junit4]at org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206) [junit4]at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478) [junit4]at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44) [junit4]at org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149) [junit4]at
[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica
[ https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5310: - Attachment: SOLR-5310-1.patch The testcases still fail occassionally Add a collection admin command to remove a replica -- Key: SOLR-5310 URL: https://issues.apache.org/jira/browse/SOLR-5310 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5310-1.patch, SOLR-5310.patch Original Estimate: 72h Remaining Estimate: 72h the only way a replica can removed is by unloading the core .There is no way to remove a replica that is down . So, the clusterstate will have unreferenced nodes if a few nodes go down over time We need a cluster admin command to clean that up e.g: /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3 The system would first see if the replica is active. If yes , a core UNLOAD command is fired , which would take care of deleting the replica from the clusterstate as well if the state is inactive, then the core or node may be down , in that case the entry is removed from cluster state -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica
[ https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5310: - Attachment: (was: SOLR-5310-1.patch) Add a collection admin command to remove a replica -- Key: SOLR-5310 URL: https://issues.apache.org/jira/browse/SOLR-5310 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5310.patch, SOLR-5310.patch Original Estimate: 72h Remaining Estimate: 72h the only way a replica can removed is by unloading the core .There is no way to remove a replica that is down . So, the clusterstate will have unreferenced nodes if a few nodes go down over time We need a cluster admin command to clean that up e.g: /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3 The system would first see if the replica is active. If yes , a core UNLOAD command is fired , which would take care of deleting the replica from the clusterstate as well if the state is inactive, then the core or node may be down , in that case the entry is removed from cluster state -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica
[ https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5310: - Attachment: SOLR-5310.patch Add a collection admin command to remove a replica -- Key: SOLR-5310 URL: https://issues.apache.org/jira/browse/SOLR-5310 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-5310.patch, SOLR-5310.patch Original Estimate: 72h Remaining Estimate: 72h the only way a replica can removed is by unloading the core .There is no way to remove a replica that is down . So, the clusterstate will have unreferenced nodes if a few nodes go down over time We need a cluster admin command to clean that up e.g: /admin/collections?action=DELETEREPLICAcollection=coll1shard=shard1replica=core_node3 The system would first see if the replica is active. If yes , a core UNLOAD command is fired , which would take care of deleting the replica from the clusterstate as well if the state is inactive, then the core or node may be down , in that case the entry is removed from cluster state -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/ No tests ran. Build Log: [...truncated 61 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!
ok maybe updateing the JDK would be a good idea :) On Fri, Oct 11, 2013 at 2:46 PM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/ No tests ran. Build Log: [...truncated 61 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!
Hihi, FYI: I have a compilation unit here (non-Lucene) that also segfaults on JDK 7.0u25, if you don't do ant clean before. If there are already existing class files and only modified ones are recompiled it always segfaults. Reproducible, but I have no idea what causes this. :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: simon.willna...@gmail.com [mailto:simon.willna...@gmail.com] On Behalf Of Simon Willnauer Sent: Friday, October 11, 2013 2:50 PM Cc: dev@lucene.apache.org Subject: Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure! ok maybe updateing the JDK would be a good idea :) On Fri, Oct 11, 2013 at 2:46 PM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/ No tests ran. Build Log: [...truncated 61 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5338) Split shards by a route key
[ https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792600#comment-13792600 ] Shalin Shekhar Mangar commented on SOLR-5338: - [~ysee...@gmail.com] - Would you mind reviewing the new CompositeIdRouter methods? Split shards by a route key --- Key: SOLR-5338 URL: https://issues.apache.org/jira/browse/SOLR-5338 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.6, 5.0 Attachments: SOLR-5338.patch Provide a way to split a shard using a route key such that all documents of the specified route key end up in a single dedicated sub-shard. Example: Assume that collection1, shard1 has hash range [0, 20]. Also that route key 'A!' has hash range [12,15]. Then invoking: {code} /admin/collections?action=SPLITcollection=collection1split.key=A! {code} should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. Specifying the source shard is not required here because the route key is enough to figure it out. Route keys spanning more than one shards will not be supported. Note that the sub-shard with the hash range of the route key may also contain documents for other route keys whose hashes collide. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: LUCENE-5252_4x.patch Fix a bug regarding ignoreCase in the attached patch. add NGramSynonymTokenizer - Key: LUCENE-5252 URL: https://issues.apache.org/jira/browse/LUCENE-5252 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Koji Sekiguchi Priority: Minor Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, LUCENE-5252_4x.patch I'd like to propose that we have another n-gram tokenizer which can process synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram size is fixed, i.e. minGramSize = maxGramSize. Today, I think we have the following problems when using SynonymFilter with NGramTokenizer. For purpose of illustration, we have a synonym setting ABC, DEFG w/ expand=true and N = 2 (2-gram). # There is no consensus (I think :-) how we assign offsets to generated synonym tokens DE, EF and FG when expanding source token AB and BC. # If the query pattern looks like ABCY, it cannot be matched even if there is a document …ABCY… in index when autoGeneratePhraseQueries set to true, because there is no CY token (but GY is there) in the index. NGramSynonymTokenizer can solve these problems by providing the following methods. * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't tokenize registered words. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |ABC|AB/DE/BC/EF/FG|ABC/DEFG| * The back and forth of the registered words, NGramSynonymTokenizer generates *extra* tokens w/ posInc=0. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| In the above sample, Z and 1 are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792661#comment-13792661 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531313 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1531313 ] SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing commands. zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792663#comment-13792663 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531315 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531315 ] SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing commands. zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator
[ https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792662#comment-13792662 ] Michael McCandless commented on LUCENE-5260: Thanks Areek, patch looks great! I like the hasPayloads() up-front introspection. In UnsortedTermFreqIteratorWrapper.payload(), why do we set currentOrd as a side effect? Shouldn't next() already do that? Maybe, we should instead assert currentOrd == ords[curPos]? Also, can we break that sneaky currentOrd assignment in next into its own line before? Make older Suggesters more accepting of TermFreqPayloadIterator --- Key: LUCENE-5260 URL: https://issues.apache.org/jira/browse/LUCENE-5260 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Areek Zillur Attachments: LUCENE-5260.patch As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would be nice to make the older suggesters accepting of TermFreqPayloadIterator and throw an exception if payload is found (if it cannot be used). This will also allow us to nuke most of the other interfaces for BytesRefIterator. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792671#comment-13792671 ] Mark Miller commented on SOLR-5325: --- Add some more testing that I thought would catch it, but it has not yet on my system. Still poking around a bit. Anyway, I've committed the fix. zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792671#comment-13792671 ] Mark Miller edited comment on SOLR-5325 at 10/11/13 2:50 PM: - Added some more testing that I thought would catch it, but it has not yet on my system. Still poking around a bit. Anyway, I've committed the fix. was (Author: markrmil...@gmail.com): Add some more testing that I thought would catch it, but it has not yet on my system. Still poking around a bit. Anyway, I've committed the fix. zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5274: Attachment: LUCENE-5274-4.patch Reworked to remove dependency on query parser and most of the analyzer dependency and to fix errors with phrases. It'll need to lose the rest of the analyzer dependency and have more test cases in addition to any other concerns raised in the review. Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274-4.patch, LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events
[ https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792679#comment-13792679 ] Mark Miller commented on SOLR-5199: --- Hey Jessica - if we can confirm this is the same issue as SOLR-5325, we can close this as a duplicate. Restarting zookeeper makes the overseer stop processing queue events Key: SOLR-5199 URL: https://issues.apache.org/jira/browse/SOLR-5199 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4 Reporter: Jessica Cheng Assignee: Mark Miller Labels: overseer, zookeeper Fix For: 4.5.1, 4.6, 5.0 Attachments: 5199-log Taking the external zookeeper down (I'm just testing, so I only have one external zookeeper instance running) and then bringing it back up seems to have caused the overseer to stop processing queue event. I tried to issue the delete collection command (curl 'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and each time it just timed out. Looking at the zookeeper data, I see ... /overseer collection-queue-work qn-02 qn-04 qn-06 ... and the qn-xxx are not being processed. Attached please find the log from the overseer (according to /overseer_elect/leader). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792684#comment-13792684 ] Mark Miller commented on SOLR-5325: --- I'm still kind of surprised this would happen - we should be retrying on connectionloss up to an expiration - which would make us the leader no longer. Perhaps the length of retrying can be a little short or something. And perhaps that is part of why it is more difficult for me to reproduce in a test. zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792688#comment-13792688 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531323 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1531323 ] SOLR-5325: raise retry padding a bit zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792689#comment-13792689 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531324 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531324 ] SOLR-5325: raise retry padding a bit zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5308) Split all documents of a route key into another collection
[ https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792692#comment-13792692 ] Shalin Shekhar Mangar commented on SOLR-5308: - For splitting a single source shard into a single target collection/shard by a route key such as: {code} /admin/collections?action=migratecollection=collection1split.key=A!shard=shardXtarget.collection=collection2target.shard=shardY {code} A rough strategy could be to: # Create new core X on source # Create new core Y on target # Ask target core to buffer updates # Start forwarding updates for route key received by source shard to target collection # Split source shard to a new core X # Ask Y to replicate fully from X # Core Admin merge Y to target core # Ask target core to replay buffer updates Split all documents of a route key into another collection -- Key: SOLR-5308 URL: https://issues.apache.org/jira/browse/SOLR-5308 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.6, 5.0 Enable SolrCloud users to split out a set of documents from a source collection into another collection. This will be useful in multi-tenant environments. This feature will make it possible to split a tenant out of a collection and put them into their own collection which can be scaled separately. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792695#comment-13792695 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531327 from [~markrmil...@gmail.com] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531327 ] SOLR-5325: raise retry padding a bit zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792694#comment-13792694 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531325 from [~markrmil...@gmail.com] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531325 ] SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing commands. zk connection loss causes overseer leader loss -- Key: SOLR-5325 URL: https://issues.apache.org/jira/browse/SOLR-5325 Project: Solr Issue Type: Bug Affects Versions: 4.3, 4.4, 4.5 Reporter: Christine Poerschke Assignee: Mark Miller Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch The problem we saw was that when the solr overseer leader experienced temporary zk connectivity problems it stopped processing overseer queue events. This first happened when quorum within the external zk ensemble was lost due to too many zookeepers being stopped (similar to SOLR-5199). The second time it happened when there was a sufficient number of zookeepers but they were holding zookeeper leadership elections and thus refused connections (the elections were taking several seconds, we were using the default zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4824) Fuzzy / Faceting results are changed after ingestion of documents past a certain number
[ https://issues.apache.org/jira/browse/SOLR-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792742#comment-13792742 ] Lakshmi Venkataswamy commented on SOLR-4824: I have tested 4.5.0 version and the same behavior has been observed. So we are staying with 3.6 in production for now. Fuzzy / Faceting results are changed after ingestion of documents past a certain number Key: SOLR-4824 URL: https://issues.apache.org/jira/browse/SOLR-4824 Project: Solr Issue Type: Bug Affects Versions: 4.2, 4.3 Environment: Ubuntu 12.04 LTS 12.04.2 jre1.7.0_17 jboss-as-7.1.1.Final Reporter: Lakshmi Venkataswamy In upgrading from SOLR 3.6 to 4.2/4.3 and comparing results on fuzzy queries, I found that after a certain number of documents were ingested the fuzzy query had drastically lower number of results. We have approximately 18,000 documents per day and after ingesting approximately 40 days of documents, the next incremental day of documents results in a lower number of results of a fuzzy search. The query : http://10.100.1.xx:8080/solr/corex/select?q=cc:worde~1facet=onfacet.field=datefl=datefacet.sort produces the following result before the threshold is crossed responselst name=responseHeader int name=status0/intint name=QTime2349/intlst name=paramsstr name=faceton/strstr name=fldate/strstr name=facet.sort/ str name=qcc:worde~1/strstr name=facet.fielddate/str/lst/lstresult name=response numFound=362803 start=0/result lst name=facet_countslst name=facet_queries/lst name=facet_fieldslst name=date int name=2012-12-312866/int int name=2013-01-0111372/int int name=2013-01-0211514/int int name=2013-01-0312015/int int name=2013-01-0411746/int int name=2013-01-0510853/int int name=2013-01-0611053/int int name=2013-01-0711815/int int name=2013-01-0811427/int int name=2013-01-0911475/int int name=2013-01-1011461/int int name=2013-01-1112058/int int name=2013-01-1211335/int int name=2013-01-1312039/int int name=2013-01-1412064/int int name=2013-01-1512234/int int name=2013-01-1612545/int int name=2013-01-1711766/int int name=2013-01-1812197/int int name=2013-01-1911414/int int name=2013-01-2011633/int int name=2013-01-2112863/int int name=2013-01-2212378/int int name=2013-01-2311947/int int name=2013-01-2411822/int int name=2013-01-2511882/int int name=2013-01-2610474/int int name=2013-01-2711051/int int name=2013-01-2811776/int int name=2013-01-2911957/int int name=2013-01-3011260/int int name=2013-01-318511/int /lst/lstlst name=facet_dates/lst name=facet_ranges//lst/response Once the 40 days of documents ingested threshold is crossed the results drop as show below for the same query responselst name=responseHeader int name=status0/intint name=QTime2/intlst name=paramsstr name=faceton/strstr name=fldate/strstr name=facet.sort/str name=qcc:worde~1/strstr name=facet.fielddate/str/lst/lst result name=response numFound=1338 start=0/result lst name=facet_countslst name=facet_queries/lst name=facet_fieldslst name=date int name=2012-12-310/int int name=2013-01-0141/int int name=2013-01-0221/int int name=2013-01-0324/int int name=2013-01-0419/int int name=2013-01-059/int int name=2013-01-0611/int int name=2013-01-0717/int int name=2013-01-0814/int int name=2013-01-0924/int int name=2013-01-1043/int int name=2013-01-1114/int int name=2013-01-1252/int int name=2013-01-1357/int int name=2013-01-1425/int int name=2013-01-1517/int int name=2013-01-1634/int int name=2013-01-1711/int int name=2013-01-1816/int int name=2013-01-19121/int int name=2013-01-2033/int int name=2013-01-2126/int int name=2013-01-2259/int int name=2013-01-2327/int int name=2013-01-2410/int int name=2013-01-259/int int name=2013-01-266/int int name=2013-01-2716/int int name=2013-01-2811/int int name=2013-01-2915/int int name=2013-01-3021/int int name=2013-01-31109/int int name=2013-02-0111/int int name=2013-02-027/int int name=2013-02-0310/int int name=2013-02-048/int int name=2013-02-0513/int int name=2013-02-0675/int int name=2013-02-0777/int int name=2013-02-0831/int int name=2013-02-0935/int int name=2013-02-1022/int int name=2013-02-1118/int int name=2013-02-1211/int int name=2013-02-1368/int int name=2013-02-1440/int /lst/lstlst name=facet_dates/lst name=facet_ranges//lst/response I have also tested this with different months of data and have seen the same issue around the number of documents. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events
[ https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792778#comment-13792778 ] Jessica Cheng commented on SOLR-5199: - Sorry, I only saw this once and I didn't have time to investigate, so I don't know what the cause is. SOLR-5325 definitely sounds similar so I'll close this issue now. Thanks! Restarting zookeeper makes the overseer stop processing queue events Key: SOLR-5199 URL: https://issues.apache.org/jira/browse/SOLR-5199 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4 Reporter: Jessica Cheng Assignee: Mark Miller Labels: overseer, zookeeper Fix For: 4.5.1, 4.6, 5.0 Attachments: 5199-log Taking the external zookeeper down (I'm just testing, so I only have one external zookeeper instance running) and then bringing it back up seems to have caused the overseer to stop processing queue event. I tried to issue the delete collection command (curl 'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and each time it just timed out. Looking at the zookeeper data, I see ... /overseer collection-queue-work qn-02 qn-04 qn-06 ... and the qn-xxx are not being processed. Attached please find the log from the overseer (according to /overseer_elect/leader). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events
[ https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng closed SOLR-5199. --- Resolution: Duplicate Restarting zookeeper makes the overseer stop processing queue events Key: SOLR-5199 URL: https://issues.apache.org/jira/browse/SOLR-5199 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4 Reporter: Jessica Cheng Assignee: Mark Miller Labels: overseer, zookeeper Fix For: 4.5.1, 4.6, 5.0 Attachments: 5199-log Taking the external zookeeper down (I'm just testing, so I only have one external zookeeper instance running) and then bringing it back up seems to have caused the overseer to stop processing queue event. I tried to issue the delete collection command (curl 'http://localhost:7574/solr/admin/collections?action=DELETEname=c1') and each time it just timed out. Looking at the zookeeper data, I see ... /overseer collection-queue-work qn-02 qn-04 qn-06 ... and the qn-xxx are not being processed. Attached please find the log from the overseer (according to /overseer_elect/leader). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5273) Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions
[ https://issues.apache.org/jira/browse/LUCENE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792800#comment-13792800 ] ASF subversion and git services commented on LUCENE-5273: - Commit 1531354 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1531354 ] LUCENE-5273: Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions. Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions - Key: LUCENE-5273 URL: https://issues.apache.org/jira/browse/LUCENE-5273 Project: Lucene - Core Issue Type: Bug Components: general/build Reporter: Steve Rowe Assignee: Steve Rowe Fix For: 4.6 Attachments: LUCENE-5273.patch As mentioned in various issues (e.g. LUCENE-3655, LUCENE-3885, SOLR-4766), we release multiple versions of the same artifact: binary Maven artifacts are not identical to the ones in the Lucene and Solr binary distributions, and the Lucene jars in the Solr binary distribution, including within the war, are not identical to the ones in the Lucene binary distribution. This is bad. It's (probably always?) not horribly bad, since the differences all appear to be caused by the build re-creating manifests and re-building jars and the Solr war from their constituents at various points in the release build process; as a result, manifest timestamp attributes, as well as archive metadata (at least constituent timestamps, maybe other things?), differ each time a jar is rebuilt. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792824#comment-13792824 ] Mark Miller commented on SOLR-5323: --- I also think this was a mistake - I don't know that we need another solr.home type thing to address it though. The root of the issue is that the clustering is not really lazy loading clustering - and the current policy is to lazy load the contrib modules - and that is because of the component. I think Erik is on to the right path with lazy SearchComponents. I think that if the only request handlers that refer to a search component are lazy, they should probably also init lazily. I have not looked into how hard that is to do, but it seems like the correct fix to bring clustering in line with the other contribs. I also think the whole enabled flag we had is no good. Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.6, 5.0 my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792834#comment-13792834 ] Dawid Weiss commented on SOLR-5323: --- I can revert to lazy-loading, not a problem. But this isn't solving the relative paths issue at all. Like I mentioned there were several times when I had to pass an example preconfigured solr configuration to somebody -- this always required that person to put the content of the example under a specific directory in Solr distribution, otherwise things wouldn't work because of relative paths. It was a pain to explain why this step is needed and to enforce... I ended up just copying the required JARs into the example. This seems wrong somehow -- if it's a solr distribution then there should be a way to reference contribs in a way that allows people to have their stuff in any folder hierarchy? What do you think? Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.6, 5.0 my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792845#comment-13792845 ] ASF subversion and git services commented on LUCENE-5269: - Commit 1531368 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1531368 ] LUCENE-5269: satisfy the policeman TestRandomChains failure Key: LUCENE-5269 URL: https://issues.apache.org/jira/browse/LUCENE-5269 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5.1, 4.6, 5.0 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792846#comment-13792846 ] ASF subversion and git services commented on LUCENE-5269: - Commit 1531369 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531369 ] LUCENE-5269: satisfy the policeman TestRandomChains failure Key: LUCENE-5269 URL: https://issues.apache.org/jira/browse/LUCENE-5269 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.5.1, 4.6, 5.0 Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792850#comment-13792850 ] Mark Miller commented on SOLR-5323: --- I just think anything with the relative paths is a separate issue. You can use any hierarchy - you just have to change those paths. I'm all for that being improved somehow, but the issue here seems to be: Solr contrib modules are lazy loaded so that if you don't use them, you can delete any of them from the dist package layout and things still work. Or you can not delete them and if you try and use them, things work. Clustering now violates that. It's not really clusterings fault, it seems to more be a limitation of the search component. Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.6, 5.0 my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792859#comment-13792859 ] Dawid Weiss commented on SOLR-5323: --- Ok, I will reverting the changes from SOLR-4708. Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.6, 5.0 my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()
[ https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792884#comment-13792884 ] ASF subversion and git services commented on LUCENE-5275: - Commit 1531376 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1531376 ] LUCENE-5275: Change AttributeSource.toString to display the current state of attributes Fix AttributeSource.toString() -- Key: LUCENE-5275 URL: https://issues.apache.org/jira/browse/LUCENE-5275 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5275.patch, LUCENE-5275.patch Its currently just Object.toString, e.g.: org.apache.lucene.analysis.en.PorterStemFilter@8a32165c But I think we should make it more useful, to end users trying to see what their chain is doing, and to make SOPs easier when debugging: {code} EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT); try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this already!)) { ts.reset(); while (ts.incrementToken()) { System.out.println(ts.toString()); } ts.end(); } {code} Proposed output: {noformat} PorterStemFilter@8a32165c term=it,bytes=[69 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@45cbde1b term=fix,bytes=[66 69 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-5323: -- Attachment: SOLR-5323.patch Patch reverting (portions) of SOLR-4708. Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.6, 5.0 Attachments: SOLR-5323.patch my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-5323: -- Fix Version/s: 4.5.1 Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5323.patch my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default
[ https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792895#comment-13792895 ] ASF subversion and git services commented on SOLR-4708: --- Commit 1531377 from [~dawidweiss] in branch 'dev/trunk' [ https://svn.apache.org/r1531377 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) Enable ClusteringComponent by default - Key: SOLR-4708 URL: https://issues.apache.org/jira/browse/SOLR-4708 Project: Solr Issue Type: Task Reporter: Erik Hatcher Assignee: Dawid Weiss Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-4708.patch, SOLR-4708.patch In the past, the ClusteringComponent used to rely on 3rd party JARs not available from a Solr distro. This is no longer the case, but the /browse UI and other references still had the clustering component disabled in the example with an awkward system property way to enable it. Let's remove all of that unnecessary stuff and just enable it as it works out of the box now. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792894#comment-13792894 ] ASF subversion and git services commented on SOLR-5323: --- Commit 1531377 from [~dawidweiss] in branch 'dev/trunk' [ https://svn.apache.org/r1531377 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5323.patch my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792897#comment-13792897 ] ASF subversion and git services commented on SOLR-5323: --- Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531378 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5323.patch my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5274: Attachment: (was: LUCENE-5274.patch) Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default
[ https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792898#comment-13792898 ] ASF subversion and git services commented on SOLR-4708: --- Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531378 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) Enable ClusteringComponent by default - Key: SOLR-4708 URL: https://issues.apache.org/jira/browse/SOLR-4708 Project: Solr Issue Type: Task Reporter: Erik Hatcher Assignee: Dawid Weiss Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-4708.patch, SOLR-4708.patch In the past, the ClusteringComponent used to rely on 3rd party JARs not available from a Solr distro. This is no longer the case, but the /browse UI and other references still had the clustering component disabled in the example with an awkward system property way to enable it. Let's remove all of that unnecessary stuff and just enable it as it works out of the box now. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5274: Attachment: (was: LUCENE-5274-4.patch) Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5274: Attachment: LUCENE-5274.patch New version of the patch. This one works a lot better with phrases and even works on fields that have the same source but different tokenizers. It still makes highlighting depend on the analysis module to pick up PerFieldAnalyzerWrapper. I think all the new code this adds to FieldPhraseList deserves a unit test on its own but I'm not in the frame of mind to write one at the moment so I'll have to come back to it later. Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default
[ https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792903#comment-13792903 ] ASF subversion and git services commented on SOLR-4708: --- Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531380 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) Enable ClusteringComponent by default - Key: SOLR-4708 URL: https://issues.apache.org/jira/browse/SOLR-4708 Project: Solr Issue Type: Task Reporter: Erik Hatcher Assignee: Dawid Weiss Priority: Minor Fix For: 4.5, 5.0 Attachments: SOLR-4708.patch, SOLR-4708.patch In the past, the ClusteringComponent used to rely on 3rd party JARs not available from a Solr distro. This is no longer the case, but the /browse UI and other references still had the clustering component disabled in the example with an awkward system property way to enable it. Let's remove all of that unnecessary stuff and just enable it as it works out of the box now. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792902#comment-13792902 ] ASF subversion and git services commented on SOLR-5323: --- Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531380 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5323.patch my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved SOLR-5323. --- Resolution: Fixed Applied to branch_4x, lucene_solr_4_5 and trunk. Solr requires -Dsolr.clustering.enabled=false when pointing at example config - Key: SOLR-5323 URL: https://issues.apache.org/jira/browse/SOLR-5323 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.5 Environment: vanilla mac Reporter: John Berryman Assignee: Dawid Weiss Fix For: 4.5.1, 4.6, 5.0 Attachments: SOLR-5323.patch my typical use of Solr is something like this: {code} cd SOLR_HOME/example cp -r solr /myProjectDir/solr_home java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar {code} But in solr 4.5.0 this fails to start successfully. I get an error: {code} org.apache.solr.common.SolrException: Error loading class 'solr.clustering.ClusteringComponent' {code} The reason is because solr.clustering.enabled defaults to true now. I don't know why this might be the case. you can get around it with {code} java -jar -Dsolr.solr.home=/myProjectDir/solr_home -Dsolr.clustering.enabled=false start.jar {code} SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792907#comment-13792907 ] Robert Muir commented on LUCENE-5274: - Why would a highlighter improvement require mocktokenizer changes? Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()
[ https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792912#comment-13792912 ] ASF subversion and git services commented on LUCENE-5275: - Commit 1531381 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531381 ] LUCENE-5275: Change AttributeSource.toString to display the current state of attributes Fix AttributeSource.toString() -- Key: LUCENE-5275 URL: https://issues.apache.org/jira/browse/LUCENE-5275 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5275.patch, LUCENE-5275.patch Its currently just Object.toString, e.g.: org.apache.lucene.analysis.en.PorterStemFilter@8a32165c But I think we should make it more useful, to end users trying to see what their chain is doing, and to make SOPs easier when debugging: {code} EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT); try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this already!)) { ts.reset(); while (ts.incrementToken()) { System.out.println(ts.toString()); } ts.end(); } {code} Proposed output: {noformat} PorterStemFilter@8a32165c term=it,bytes=[69 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@45cbde1b term=fix,bytes=[66 69 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792911#comment-13792911 ] Jessica Cheng commented on SOLR-4816: - I think the latest patch: -if (request instanceof IsUpdateRequest updatesToLeaders) { +if (request instanceof IsUpdateRequest) { removed the effect of the updatesToLeaders variable. Looking at http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_5/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrServer.java?view=markup it's not used anywhere to make a decision anymore. Add document routing to CloudSolrServer --- Key: SOLR-4816 URL: https://issues.apache.org/jira/browse/SOLR-4816 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Joel Bernstein Assignee: Mark Miller Priority: Minor Fix For: 4.5, 5.0 Attachments: RequestTask-removal.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch This issue adds the following enhancements to CloudSolrServer's update logic: 1) Document routing: Updates are routed directly to the correct shard leader eliminating document routing at the server. 2) Optional parallel update execution: Updates for each shard are executed in a separate thread so parallel indexing can occur across the cluster. These enhancements should allow for near linear scalability on indexing throughput. Usage: CloudSolrServer cloudClient = new CloudSolrServer(zkAddress); cloudClient.setParallelUpdates(true); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField(id, 0); doc1.addField(a_t, hello1); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField(id, 2); doc2.addField(a_t, hello2); UpdateRequest request = new UpdateRequest(); request.add(doc1); request.add(doc2); request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false); NamedList response = cloudClient.request(request); // Returns a backwards compatible condensed response. //To get more detailed response down cast to RouteResponse: CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response; -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792913#comment-13792913 ] Nik Everett commented on LUCENE-5274: - Hey, forgot to mention that. MockTokenizer seems to throw away the character after the end of each token even if that character is the valid start to the next token. This comes up because I wanted to tokenize strings in a simplistic way to test that the highlighter can handle different tokenizers and it just wasn't working right. So I fixed MockTokenizer but I did it in a pretty brutal way. I'm happy to move the change to another bug and improve it but testing the highlighter change without it is a bit painful. Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792921#comment-13792921 ] Robert Muir commented on LUCENE-5274: - if you suspect there is a bug in mocktokenizer, please open a separate issue for that. mocktokenizer is used by like, thousands of tests :) Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5275) Fix AttributeSource.toString()
[ https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5275. - Resolution: Fixed Fix Version/s: 5.0 4.6 Fix AttributeSource.toString() -- Key: LUCENE-5275 URL: https://issues.apache.org/jira/browse/LUCENE-5275 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.6, 5.0 Attachments: LUCENE-5275.patch, LUCENE-5275.patch Its currently just Object.toString, e.g.: org.apache.lucene.analysis.en.PorterStemFilter@8a32165c But I think we should make it more useful, to end users trying to see what their chain is doing, and to make SOPs easier when debugging: {code} EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT); try (TokenStream ts = analyzer.tokenStream(body, Its 2013, let's fix this already!)) { ts.reset(); while (ts.incrementToken()) { System.out.println(ts.toString()); } ts.end(); } {code} Proposed output: {noformat} PorterStemFilter@8a32165c term=it,bytes=[69 74],startOffset=0,endOffset=3,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 33],startOffset=4,endOffset=8,positionIncrement=1,type=NUM,keyword=false PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 74],startOffset=10,endOffset=15,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@45cbde1b term=fix,bytes=[66 69 78],startOffset=16,endOffset=19,positionIncrement=1,type=ALPHANUM,keyword=false PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 69],startOffset=25,endOffset=32,positionIncrement=2,type=ALPHANUM,keyword=false {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4073) Overseer will miss operations in some cases for OverseerCollectionProcessor
[ https://issues.apache.org/jira/browse/SOLR-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4073. --- Resolution: Duplicate Fix Version/s: (was: 4.6) Overseer will miss operations in some cases for OverseerCollectionProcessor Key: SOLR-4073 URL: https://issues.apache.org/jira/browse/SOLR-4073 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2 Environment: Solr cloud Reporter: Raintung Li Assignee: Mark Miller Attachments: patch-4073 Original Estimate: 168h Remaining Estimate: 168h One overseer disconnect with Zookeeper, but overseer thread still handle the request(A) in the DistributedQueue. Example: overseer thread reconnect Zookeeper try to remove the Top's request. workQueue.remove();. Now the other server will take over the overseer privilege because old overseer disconnect. Start overseer thread and handle the queue request(A) again, and remove the request(A) from queue, then try to get the top's request(B, doesn't get). In the this time old overseer reconnect with ZooKeeper, and remove the top's request from queue. Now the top request is B, it is moved by old overseer server. New overseer server never do B request,because this request deleted by old overseer server, at the last this request(B) miss operations. At best, distributeQueue.peek can get the request's ID that will be removed for workqueue.remove(ID), not remove the top's request. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5265) Make BlockPackedWriter constructor take an acceptable overhead ratio
[ https://issues.apache.org/jira/browse/LUCENE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5265: - Attachment: LUCENE-5265.patch Here is a patch. Make BlockPackedWriter constructor take an acceptable overhead ratio Key: LUCENE-5265 URL: https://issues.apache.org/jira/browse/LUCENE-5265 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Minor Attachments: LUCENE-5265.patch Follow-up of http://search-lucene.com/m/SjmSW1CZYuZ1 MemoryDocValuesFormat takes an acceptable overhead ratio but it is only used when doing table compression. It should be used for all compression methods, especially DELTA_COMPRESSED whose encoding is based on BlockPackedWriter. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5266) Optimization of the direct PackedInts readers
[ https://issues.apache.org/jira/browse/LUCENE-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792948#comment-13792948 ] Adrien Grand commented on LUCENE-5266: -- bq. The only caveat is the encoding would need to ensure there is always an extra 2 bytes at the end. There are some places (codecs) where I encode many short sequences consecutively so I care about not wasting extra bytes but if this proves to help performance, I think it shouldn't be too hard to do add the ability to have extra bytes at the end of the stream (I'm thinking about adding a new PackedInts.Format to the enum but there might be other options). Optimization of the direct PackedInts readers - Key: LUCENE-5266 URL: https://issues.apache.org/jira/browse/LUCENE-5266 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5266.patch, LUCENE-5266.patch Given that the initial focus for PackedInts readers was more on in-memory readers (for storing stuff like the mapping from old to new doc IDs at merging time), I never spent time trying to optimize the direct readers although it could be beneficial now that they are used for disk-based doc values. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5340) Add support for named snapshots
Mike Schrag created SOLR-5340: - Summary: Add support for named snapshots Key: SOLR-5340 URL: https://issues.apache.org/jira/browse/SOLR-5340 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.5 Reporter: Mike Schrag It would be really nice if Solr supported named snapshots. Right now if you snapshot a SolrCloud cluster, every node potentially records a slightly different timestamp. Correlating those back together to effectively restore the entire cluster to a consistent snapshot is pretty tedious. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
Nik Everett created LUCENE-5278: --- Summary: MockTokenizer throws away the character right after a token even if it is a valid start to a new token Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Priority: Trivial MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(.) or RegExp(...). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792974#comment-13792974 ] Nik Everett commented on LUCENE-5274: - Filed LUCENE-5278. Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5278: Attachment: LUCENE-5278.patch This patch fixes the behaviour from my perspective but breaks a bunch of other tests. MockTokenizer throws away the character right after a token even if it is a valid start to a new token -- Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Priority: Trivial Attachments: LUCENE-5278.patch MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(.) or RegExp(...). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792993#comment-13792993 ] Robert Muir commented on LUCENE-5278: - I think i understand what you want: it makes sense. The only reason its the way it is today is because this thing historically came from CharTokenizer (see the isTokenChar?). But it would be better if you could e.g. make a pattern like ([A-Z]a-z+) and for it to actually break FooBar into Foo, Bar rather than throwout out bar all together. I'll dig into this! MockTokenizer throws away the character right after a token even if it is a valid start to a new token -- Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Priority: Trivial Attachments: LUCENE-5278.patch MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(.) or RegExp(...). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793000#comment-13793000 ] Robert Muir commented on LUCENE-5274: - Thanks Nik: I can help with that one! Another question: about the MergedIterator :) I can see the possible use case here, but I think it deserves some discussion first (versus just making it public). This thing has limitations (its currently only used by indexwriter for buffereddeletes, its basically like a MultiTerms over an Iterator). For example each iterator it consumes should not have duplicate values according to its compareTo(): its not clear to me this WeightedPhraseInfo behaves this way: * what if you have a synonym of dog sitting on top of cat with the same boost factor... its a duplicate according to that compareTo, but the text is different. * what if the synonym is just dog with posinc=0 stacked ontop of itself (which is totally valid to do)... Perhaps highlighting can make use of it, but its unclear to me that its really following the contract. Furthermore the class in question (WeightedPhraseInfo) is public, and adding Comparable to it looks like it will create a situation where its inconsistent with equals()... I think this is a little dangerous. If it turns out we can reuse it: great! But i think rather than just slapping public on it, we should move it to .util, ensure it has good javadocs and unit tests, and investigate what exactly happens when these contracts are violated: e.g. can we make an exception happen rather than just broken behavior in a way that won't hurt performance and so on? Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-5278: --- Assignee: Robert Muir MockTokenizer throws away the character right after a token even if it is a valid start to a new token -- Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Robert Muir Priority: Trivial Attachments: LUCENE-5278.patch MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(.) or RegExp(...). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793018#comment-13793018 ] Nik Everett commented on LUCENE-5274: - {quote} I can see the possible use case here, but I think it deserves some discussion first (versus just making it public). {quote} Sure! I'm more used to Guava's tools so I think I was lulled in to a false sense of recognition. No chance of updating to a modern version of Guava?:) {quote} This thing has limitations (its currently only used by indexwriter for buffereddeletes, its basically like a MultiTerms over an Iterator). For example each iterator it consumes should not have duplicate values according to its compareTo(): its not clear to me this WeightedPhraseInfo behaves this way {quote} Yikes! I didn't catch that but now that you point it out it is right there in the docs and I should have. WeightedPhraseInfo doesn't behave that way and {quote} Furthermore the class in question (WeightedPhraseInfo) is public, and adding Comparable to it looks like it will create a situation where its inconsistent with equals()... I think this is a little dangerous. {quote} I agree on the inconsistent with inconsistent with equals. I can either fix that or use a Comparator for sorting both WeightedPhraseInfo and Toffs. That'd require a MergeSorter that can take one but {quote} If it turns out we can reuse it: great! But i think rather than just slapping public on it, we should move it to .util, ensure it has good javadocs and unit tests, and investigate what exactly happens when these contracts are violated: e.g. can we make an exception happen rather than just broken behavior in a way that won't hurt performance and so on? {quote} Makes sense to me. Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793029#comment-13793029 ] Robert Muir commented on LUCENE-5274: - {quote} Sure! I'm more used to Guava's tools so I think I was lulled in to a false sense of recognition. No chance of updating to a modern version of Guava? {quote} There is no lucene dependency on guava. I don't think we should introduce one, and it wouldnt solve the issues i mentioned anyway (e.g. comparable inconsistent with equals and stuff). It would only add 2.1MB of bloated unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its useless). We should keep our third party dependencies minimal and necessary so that any app using lucene can choose for itself what version of this stuff (if any) it wants to use. If we rely upon unnecessary stuff it hurts the end user by forcing them to compatible versions. Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5027) Field Collapsing PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5027: - Attachment: SOLR-5027.patch Field Collapsing PostFilter --- Key: SOLR-5027 URL: https://issues.apache.org/jira/browse/SOLR-5027 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.6, 5.0 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch This ticket introduces the *CollapsingQParserPlugin* The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. This is a high performance alternative to standard Solr field collapsing (with *ngroups*) when the number of distinct groups in the result set is high. For example in one performance test, a search with 10 million full results and 1 million collapsed groups: Standard grouping with ngroups : 17 seconds. CollapsingQParserPlugin: 300 milli-seconds. Sample syntax: Collapse based on the highest scoring document: {code} fq=(!collapse field=field_name} {code} Collapse based on the min value of a numeric field: {code} fq={!collapse field=field_name min=field_name} {code} Collapse based on the max value of a numeric field: {code} fq={!collapse field=field_name max=field_name} {code} Collapse with a null policy: {code} fq={!collapse field=field_name nullPolicy=null_policy} {code} There are three null policies: ignore : removes docs with a null value in the collapse field (default). expand : treats each doc with a null value in the collapse field as a separate group. collapse : collapses all docs with a null value into a single group using either highest score, or min/max. The CollapsingQParserPlugin also fully supports the QueryElevationComponent *Note:* The July 16 patch also includes and ExpandComponent that expands the collapsed groups for the current search result page. This functionality will be moved to it's own ticket. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793035#comment-13793035 ] Joel Bernstein commented on SOLR-5027: -- Patch that passes precommit for trunk Field Collapsing PostFilter --- Key: SOLR-5027 URL: https://issues.apache.org/jira/browse/SOLR-5027 Project: Solr Issue Type: New Feature Components: search Affects Versions: 5.0 Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.6, 5.0 Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch This ticket introduces the *CollapsingQParserPlugin* The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. This is a high performance alternative to standard Solr field collapsing (with *ngroups*) when the number of distinct groups in the result set is high. For example in one performance test, a search with 10 million full results and 1 million collapsed groups: Standard grouping with ngroups : 17 seconds. CollapsingQParserPlugin: 300 milli-seconds. Sample syntax: Collapse based on the highest scoring document: {code} fq=(!collapse field=field_name} {code} Collapse based on the min value of a numeric field: {code} fq={!collapse field=field_name min=field_name} {code} Collapse based on the max value of a numeric field: {code} fq={!collapse field=field_name max=field_name} {code} Collapse with a null policy: {code} fq={!collapse field=field_name nullPolicy=null_policy} {code} There are three null policies: ignore : removes docs with a null value in the collapse field (default). expand : treats each doc with a null value in the collapse field as a separate group. collapse : collapses all docs with a null value into a single group using either highest score, or min/max. The CollapsingQParserPlugin also fully supports the QueryElevationComponent *Note:* The July 16 patch also includes and ExpandComponent that expands the collapsed groups for the current search result page. This functionality will be moved to it's own ticket. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors
[ https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793037#comment-13793037 ] Bill Bell commented on LUCENE-5212: --- It appears this happens on 7u40 64-bit too. See https://bugs.openjdk.java.net/browse/JDK-8024830 Am I reading this wrong? Start failing around hs24-b21: [junit4] # SIGSEGV (0xb) at pc=0xfd7ff91d9f7d, pid=23810, tid=343 [junit4] # [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b54) [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b21 mixed mode solaris-amd64 ) [junit4] # Problematic frame: [junit4] # J org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; [junit4] # Note, first 7u40 build b01 has hs24-b24. Next, I will try to find changeset. java 7u40 causes sigsegv and corrupt term vectors - Key: LUCENE-5212 URL: https://issues.apache.org/jira/browse/LUCENE-5212 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: crashFaster2.0.patch, crashFaster.patch, hs_err_pid32714.log, jenkins.txt -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793038#comment-13793038 ] Nik Everett commented on LUCENE-5274: - {{quote}} There is no lucene dependency on guava. I don't think we should introduce one, and it wouldnt solve the issues i mentioned anyway (e.g. comparable inconsistent with equals and stuff). It would only add 2.1MB of bloated unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its useless). We should keep our third party dependencies minimal and necessary so that any app using lucene can choose for itself what version of this stuff (if any) it wants to use. If we rely upon unnecessary stuff it hurts the end user by forcing them to compatible versions. {{quote}} I figured that was the reasoning and I don't intend to argue with it. In this case it would provide a method to merge sorted iterators just like MergedIterator only without the caveats around duplication but I'm happy to work around it. Guava certainly wouldn't fix my forgetting equals and hashcode. Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches
Michael McCandless created LUCENE-5279: -- Summary: Don't use recursion in DisjunctionSumScorer.countMatches Key: LUCENE-5279 URL: https://issues.apache.org/jira/browse/LUCENE-5279 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I noticed the TODO in there, to not use recursion, so I fixed it to just use a private queue ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches
[ https://issues.apache.org/jira/browse/LUCENE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5279: --- Attachment: LUCENE-5279.patch Patch. However, it seems to be slower, testing on full Wikpedia en: {noformat} Report after iter 10: TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 14.44 (7.7%) 12.48 (4.7%) -13.6% ( -24% - -1%) OrHighHigh5.56 (6.2%)4.86 (4.4%) -12.6% ( -21% - -2%) OrHighMed 18.62 (6.7%) 16.29 (4.4%) -12.5% ( -22% - -1%) AndHighLow 398.09 (1.6%) 390.34 (2.3%) -1.9% ( -5% -1%) OrNotHighLow 374.60 (1.7%) 369.61 (1.7%) -1.3% ( -4% -2%) Fuzzy1 67.10 (2.1%) 66.41 (2.2%) -1.0% ( -5% -3%) OrNotHighMed 51.68 (1.7%) 51.37 (1.5%) -0.6% ( -3% -2%) Fuzzy2 46.73 (2.8%) 46.45 (2.6%) -0.6% ( -5% -4%) OrHighNotLow 20.05 (3.5%) 19.96 (5.0%) -0.5% ( -8% -8%) OrHighNotMed 27.15 (3.2%) 27.05 (4.8%) -0.3% ( -8% -7%) OrNotHighHigh7.72 (3.2%)7.70 (4.7%) -0.3% ( -7% -7%) OrHighNotHigh9.81 (3.0%)9.79 (4.5%) -0.1% ( -7% -7%) LowSloppyPhrase 43.83 (1.9%) 43.89 (2.1%) 0.2% ( -3% -4%) IntNRQ3.49 (4.5%)3.50 (4.1%) 0.2% ( -8% -9%) Prefix3 70.74 (2.7%) 71.01 (2.4%) 0.4% ( -4% -5%) HighTerm 65.33 (3.0%) 65.62 (13.5%) 0.4% ( -15% - 17%) MedSloppyPhrase3.47 (3.5%)3.49 (4.7%) 0.6% ( -7% -9%) LowPhrase 13.06 (1.5%) 13.14 (2.0%) 0.6% ( -2% -4%) Wildcard 16.71 (2.9%) 16.82 (2.2%) 0.7% ( -4% -5%) MedTerm 100.90 (2.5%) 101.71 (10.4%) 0.8% ( -11% - 14%) LowTerm 311.85 (1.4%) 314.53 (6.4%) 0.9% ( -6% -8%) HighSpanNear8.06 (5.1%)8.13 (5.9%) 0.9% ( -9% - 12%) Respell 48.00 (2.3%) 48.45 (2.8%) 0.9% ( -4% -6%) HighSloppyPhrase3.40 (4.1%)3.43 (6.6%) 1.0% ( -9% - 12%) AndHighMed 34.14 (1.6%) 34.52 (1.7%) 1.1% ( -2% -4%) AndHighHigh 28.15 (1.7%) 28.48 (1.7%) 1.2% ( -2% -4%) MedSpanNear 30.62 (2.8%) 31.07 (3.2%) 1.5% ( -4% -7%) LowSpanNear 10.30 (2.6%) 10.48 (2.9%) 1.7% ( -3% -7%) MedPhrase 195.60 (5.1%) 201.44 (6.6%) 3.0% ( -8% - 15%) HighPhrase4.17 (5.6%)4.34 (6.9%) 4.0% ( -8% - 17%) {noformat} So ... I don't plan on pursuing it any further, but wanted to open the issue in case anybody wants to try ... Don't use recursion in DisjunctionSumScorer.countMatches Key: LUCENE-5279 URL: https://issues.apache.org/jira/browse/LUCENE-5279 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-5279.patch I noticed the TODO in there, to not use recursion, so I fixed it to just use a private queue ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5278: Attachment: LUCENE-5278.patch Nice patch Nik! I think this is ready: i tweaked variable names and rearranged stuff (e.g. i use -1 instead of Integer so we arent boxing and a few other things). I also added some unit tests. The main issues why tests were failing with your original patch: * reset() needed to clear the buffer variables. * the state machine needed some particular extra check when emitting a token: e.g. if you make a regex of .., but you send it abcde, the tokens should be ab, cd, but not e. so when we end on a partial match, we have to check that we are in an accept state. * term-limit-exceeded is a special case (versus last character being in a reject state) MockTokenizer throws away the character right after a token even if it is a valid start to a new token -- Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Robert Muir Priority: Trivial Attachments: LUCENE-5278.patch, LUCENE-5278.patch MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(.) or RegExp(...). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5278: Attachment: LUCENE-5278.patch added a few more tests to TestMockAnalyzer so all these crazy corner cases are found there and not debugging other tests :) MockTokenizer throws away the character right after a token even if it is a valid start to a new token -- Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Robert Muir Priority: Trivial Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(.) or RegExp(...). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight child fields with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793204#comment-13793204 ] Robert Muir commented on LUCENE-5274: - Yeah I guess for me, its not a caveat at all, but a feature :) We need to iterate sorted-union for stuff in the index like terms and fields, so they appear as if they exist only once. The guava one isn't doing a union operation but just simply maintaining compareTo() order... Teach fast FastVectorHighlighter to highlight child fields with parent fields --- Key: LUCENE-5274 URL: https://issues.apache.org/jira/browse/LUCENE-5274 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nik Everett Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-5274.patch I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on child fields. Like this query: foo:scissors foo_exact:running would highlight foo like this: emrunning/em with emscissors/em Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. This would make queries that perform weighted matches against different analyzers much more convenient to highlight. I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793205#comment-13793205 ] ASF subversion and git services commented on LUCENE-5278: - Commit 1531479 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1531479 ] LUCENE-5278: remove CharTokenizer brain-damage from MockTokenizer so it works better with custom regular expressions MockTokenizer throws away the character right after a token even if it is a valid start to a new token -- Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Robert Muir Priority: Trivial Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(.) or RegExp(...). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793206#comment-13793206 ] Robert Muir commented on LUCENE-5278: - I committed this to trunk: I did a lot of testing locally but I want to let Jenkins have its way with it for a few hours before backporting to branch_4x. MockTokenizer throws away the character right after a token even if it is a valid start to a new token -- Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Assignee: Robert Muir Priority: Trivial Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(.) or RegExp(...). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
help in getting sort to work on an indexed binary field
Hi, We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... Any advice is appreciated! Thanks, Jessica
[jira] [Commented] (SOLR-5330) PerSegmentSingleValuedFaceting overwrites facet values
[ https://issues.apache.org/jira/browse/SOLR-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793213#comment-13793213 ] Yonik Seeley commented on SOLR-5330: So I instrumented the faceting code like so: {code} seg.tempBR = seg.tenum.next(); if (seg.tempBR.bytes == val.bytes) { System.err.println(##SHARING DETECTED: val.offset=+val.offset + val.length=+val.length + new.offset=+seg.tempBR.offset + new.length=+seg.tempBR.length); if (val.offset == seg.tempBR.offset) { System.err.println(!!SHARING USING SAME OFFSET); } {code} And it detects tons of sharing (the returned bytesref still pointing to the same byte[]) of course... but the thing is, it never generates an invalid result. calling next() on the term enum never changes the bytes that were previously pointed to... it simply points to a different part of the same byte array. I can never detect a case where the original bytes are changed, thus invalidating the shallow copy. PerSegmentSingleValuedFaceting overwrites facet values -- Key: SOLR-5330 URL: https://issues.apache.org/jira/browse/SOLR-5330 Project: Solr Issue Type: Bug Affects Versions: 4.2.1 Reporter: Michael Froh Assignee: Yonik Seeley Attachments: solr-5330.patch I recently tried enabling facet.method=fcs for one of my indexes and found a significant performance improvement (with a large index, many facet values, and near-realtime updates). Unfortunately, the results were also wrong. Specifically, some facet values were being partially overwritten by other facet values. (That is, if I expected facet values like abcdef and 123, I would get a value like 123def.) Debugging through the code, it looks like the problem was in PerSegmentSingleValuedFaceting, specifically in the getFacetCounts method, when BytesRef val is shallow-copied from the temporary per-segment BytesRef. The byte array assigned to val is shared with the byte array for seg.tempBR, and is overwritten a few lines down by the call to seg.tenum.next(). I managed to fix it locally by replacing the shallow copy with a deep copy. While I encountered this problem on Solr 4.2.1, I see that the code is identical in 4.5. Unless the behavior of TermsEnum.next() has changed, I believe this bug still exists. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5330) PerSegmentSingleValuedFaceting overwrites facet values
[ https://issues.apache.org/jira/browse/SOLR-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793213#comment-13793213 ] Yonik Seeley edited comment on SOLR-5330 at 10/12/13 2:30 AM: -- So I instrumented the faceting code like so: {code} seg.tempBR = seg.tenum.next(); if (seg.tempBR.bytes == val.bytes) { System.err.println(##SHARING DETECTED: val.offset=+val.offset + val.length=+val.length + new.offset=+seg.tempBR.offset + new.length=+seg.tempBR.length); if (val.offset == seg.tempBR.offset) { System.err.println(!!SHARING USING SAME OFFSET); } {code} And it detects tons of sharing (the returned bytesref still pointing to the same byte[]) of course... but the thing is, it never generates an invalid result. calling next() on the term enum never changes the bytes that were previously pointed to... it simply points to a different part of the same byte array. I can never detect a case where the original bytes are changed, thus invalidating the shallow copy. Example output: {code} ##SHARING DETECTED: val.offset=1 val.length=4 new.offset=6 new.length=4 {code} was (Author: ysee...@gmail.com): So I instrumented the faceting code like so: {code} seg.tempBR = seg.tenum.next(); if (seg.tempBR.bytes == val.bytes) { System.err.println(##SHARING DETECTED: val.offset=+val.offset + val.length=+val.length + new.offset=+seg.tempBR.offset + new.length=+seg.tempBR.length); if (val.offset == seg.tempBR.offset) { System.err.println(!!SHARING USING SAME OFFSET); } {code} And it detects tons of sharing (the returned bytesref still pointing to the same byte[]) of course... but the thing is, it never generates an invalid result. calling next() on the term enum never changes the bytes that were previously pointed to... it simply points to a different part of the same byte array. I can never detect a case where the original bytes are changed, thus invalidating the shallow copy. PerSegmentSingleValuedFaceting overwrites facet values -- Key: SOLR-5330 URL: https://issues.apache.org/jira/browse/SOLR-5330 Project: Solr Issue Type: Bug Affects Versions: 4.2.1 Reporter: Michael Froh Assignee: Yonik Seeley Attachments: solr-5330.patch I recently tried enabling facet.method=fcs for one of my indexes and found a significant performance improvement (with a large index, many facet values, and near-realtime updates). Unfortunately, the results were also wrong. Specifically, some facet values were being partially overwritten by other facet values. (That is, if I expected facet values like abcdef and 123, I would get a value like 123def.) Debugging through the code, it looks like the problem was in PerSegmentSingleValuedFaceting, specifically in the getFacetCounts method, when BytesRef val is shallow-copied from the temporary per-segment BytesRef. The byte array assigned to val is shared with the byte array for seg.tempBR, and is overwritten a few lines down by the call to seg.tenum.next(). I managed to fix it locally by replacing the shallow copy with a deep copy. While I encountered this problem on Solr 4.2.1, I see that the code is identical in 4.5. Unless the behavior of TermsEnum.next() has changed, I believe this bug still exists. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5277) Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset
[ https://issues.apache.org/jira/browse/LUCENE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793217#comment-13793217 ] Shai Erera commented on LUCENE-5277: I thought of that ... it started in LUCENE-5248 where I want to keep a growable bitset alongside the docs/values arrays to mark whether a document has an updated value or not (following Rob's idea). When I implemented that using OpenBitSet, I discovered the bug and opened LUCENE-5272. As I worked on fixing the bug, I realized OBS has other issues as well and thought that perhaps I can use FixedBitSet, only grow it by copying its array. This is doable even without the ctor, since I can call getBits() and do it like that: {code} FixedBitSet newBits = new FixedBitSet(17); // new capacity System.arraycopy(oldBits.getBits(), 0, newBits.getBits(), 0, oldBits.getBits().length); {code} I then noticed there is a ctor already in FixedBitSet which copies another FBS so I thought just to improve it. It seems more intuitive to do t than let users figure out they can grow a FixedBitSet like above? Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset --- Key: LUCENE-5277 URL: https://issues.apache.org/jira/browse/LUCENE-5277 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5277.patch FixedBitSet copy constructor is redundant the way it is now -- one can call FBS.clone() to achieve that (and indeed, no code in Lucene calls this ctor). I think it will be useful to add a numBits parameter to that method to allow growing/shrinking the new bitset, while copying all relevant bits from the passed one. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates
[ https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793221#comment-13793221 ] Shai Erera commented on LUCENE-5248: bq. Do we have test coverage of updating with null (deleting the update from the document)? We have TestNDVUpdates.testUnsetValue and testUnsetAllValues, though we don't have a test which unsets a value while a document is merging. We have tests that cover updating a value (no unsetting) while it is merging, I guess I can modify them to unset as well, but will then need to improve the test to use docsWithField. I'll look into it. bq. So if there are two terms in a row with the same field (which does not exist) won't we hit NPE? Good catch! You're right, I had another {{if (termsEnum == null) continue}} but I removed it since I thought the above if takes care of that. I added a unit test which reproduces and the fix. Will commit on LUCENE-5189. Improve the data structure used in ReaderAndLiveDocs to hold the updates Key: LUCENE-5248 URL: https://issues.apache.org/jira/browse/LUCENE-5248 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch Currently ReaderAndLiveDocs holds the updates in two structures: +MapString,MapInteger,Long+ Holds a mapping from each field, to all docs that were updated and their values. This structure is updated when applyDeletes is called, and needs to satisfy several requirements: # Un-ordered writes: if a field f is updated by two terms, termA and termB, in that order, and termA affects doc=100 and termB doc=2, then the updates are applied in that order, meaning we cannot rely on updates coming in order. # Same document may be updated multiple times, either by same term (e.g. several calls to IW.updateNDV) or by different terms. Last update wins. # Sequential read: when writing the updates to the Directory (fieldsConsumer), we iterate on the docs in-order and for each one check if it's updated and if not, pull its value from the current DV. # A single update may affect several million documents, therefore need to be efficient w.r.t. memory consumption. +MapInteger,MapString,Long+ Holds a mapping from a document, to all the fields that it was updated in and the updated value for each field. This is used by IW.commitMergedDeletes to apply the updates that came in while the segment was merging. The requirements this structure needs to satisfy are: # Access in doc order: this is how commitMergedDeletes works. # One-pass: we visit a document once (currently) and so if we can, it's better if we know all the fields in which it was updated. The updates are applied to the merged ReaderAndLiveDocs (where they are stored in the first structure mentioned above). Comments with proposals will follow next. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates
[ https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793224#comment-13793224 ] Shai Erera commented on LUCENE-5248: bq. I added a unit test which reproduces and the fix. Will commit on LUCENE-5189. Sorry, it's a bug introduced in this patch so I'll fix here. Improve the data structure used in ReaderAndLiveDocs to hold the updates Key: LUCENE-5248 URL: https://issues.apache.org/jira/browse/LUCENE-5248 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch Currently ReaderAndLiveDocs holds the updates in two structures: +MapString,MapInteger,Long+ Holds a mapping from each field, to all docs that were updated and their values. This structure is updated when applyDeletes is called, and needs to satisfy several requirements: # Un-ordered writes: if a field f is updated by two terms, termA and termB, in that order, and termA affects doc=100 and termB doc=2, then the updates are applied in that order, meaning we cannot rely on updates coming in order. # Same document may be updated multiple times, either by same term (e.g. several calls to IW.updateNDV) or by different terms. Last update wins. # Sequential read: when writing the updates to the Directory (fieldsConsumer), we iterate on the docs in-order and for each one check if it's updated and if not, pull its value from the current DV. # A single update may affect several million documents, therefore need to be efficient w.r.t. memory consumption. +MapInteger,MapString,Long+ Holds a mapping from a document, to all the fields that it was updated in and the updated value for each field. This is used by IW.commitMergedDeletes to apply the updates that came in while the segment was merging. The requirements this structure needs to satisfy are: # Access in doc order: this is how commitMergedDeletes works. # One-pass: we visit a document once (currently) and so if we can, it's better if we know all the fields in which it was updated. The updates are applied to the merged ReaderAndLiveDocs (where they are stored in the first structure mentioned above). Comments with proposals will follow next. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org