PyLucene 4.8.0 - 'make install' with 'root=' missing a couple of files
Hello folks, I’m working on packaging PyLucene to a Slackware package by using a setup.cfg in the source directory and redirecting the installation root to /tmp/pylucene_installdir. I noticed that a couple of files are missing when doing this alternate root install compared to the regular install. ## setup.cfg ### [easy_install] [build] [install] root = /tmp/pylucene_installdir compile = False force = True single-version-externally-managed = True ## I noticed that two files are missing when I do this root= install compared to the regular install to /usr/lib…/ Are these two files below not necessary when packaging PyLucene for distribution? Missing: native_libs.txt -- Contains: lucene/_lucene.so Missing: _lucene.py -- Contains: def __bootstrap__(): global __bootstrap__, __loader__, __file__ import sys, pkg_resources, imp __file__ = pkg_resources.resource_filename(__name__, '_lucene.so') __loader__ = None; del __bootstrap__, __loader__ imp.load_dynamic(__name__,__file__) __bootstrap__() Thanks in advance! Regards, —Ed
Re: PyLucene 4.8.0 - 'make install' with 'root=' missing a couple of files
On Tue, 27 May 2014, Eduard Rozenberg wrote: Hello folks, I?m working on packaging PyLucene to a Slackware package by using a setup.cfg in the source directory and redirecting the installation root to /tmp/pylucene_installdir. I noticed that a couple of files are missing when doing this alternate root install compared to the regular install. ## setup.cfg ### [easy_install] [build] [install] root = /tmp/pylucene_installdir compile = False force = True single-version-externally-managed = True ## I noticed that two files are missing when I do this root= install compared to the regular install to /usr/lib?/ Are these two files below not necessary when packaging PyLucene for distribution? Missing: native_libs.txt I don't know what this file, native_libs.txt, is for. Maybe a setuptools artifact ? -- Contains: lucene/_lucene.so Missing: _lucene.py Yes, that one you need. Did you try running pylucene tests without it ? Andi.. -- Contains: def __bootstrap__(): global __bootstrap__, __loader__, __file__ import sys, pkg_resources, imp __file__ = pkg_resources.resource_filename(__name__, '_lucene.so') __loader__ = None; del __bootstrap__, __loader__ imp.load_dynamic(__name__,__file__) __bootstrap__() Thanks in advance! Regards, ?Ed
Getting term vectors/computing cosine similarity
*tl;dnr*: a next() method is defined for the Java class TVTermsEnum in Lucene 4.8.1, but it looks like there is no next() method available for an object that looks like it is an instance of the Python class TVTermsEnum in PyLucene 4.8.1. I have a set of documents that I would like to cluster. These documents share a vocabulary of only about 3,000 unique terms, but there are about 15,000,000 documents. One way I thought of doing this would be to index the documents using PyLucene (Python is the preferred programming language at work), obtain term vectors for the documents using PyLucene API functions, and calculate cosine similarities between pairs of term vectors in order to determine which documents are close to each other. I found some sample Java code on the web that various people have posted showing ways to do this with older versions of Lucene. I downloaded PyLucene 4.8.1 and compared its API functions with the ones used in the code samples, and saw that this is an area of Lucene that has changed quite a bit. I can send an email to the lucene-user mailing group to ask what would be a good way of doing this using version 4.8.1, but the question I have for this mailing group has to do with some Java API functions that it looks like are not exposed in Python, unless I have to go about accessing them in a different way. If I obtain the term vector for the field cat_ids in a document with id doc_id_1 doc_1_tfv = reader.getTermVector(doc_id_1, cat_ids) then doc_1_tfv is displayed as this object: Terms: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTerms@32c46396 In some of the sample code I looked at, the terms in doc_1_tfv could be obtained with doc_1_tfv.getTerms(), but it looks like getTerms is not a member function of Terms or its subclasses any more. In another code sample, an iterator for the term vector is obtained via tfv_iter = doc_1_tfv.iterator(None) and then the terms are obtained one by one with calls to tfv_iter.next(). This is where I get stuck. tfv_iter has this value: TermsEnum: org.apache.lucene.codecs.compressing.CompressingTermVectorsReader$TVTermsEnum@1cca2369 and there is a next() function defined for the TVTermsEnum class, but this object doesn't list next() as one of its member functions and an exception is raised if it is called. It looks like the object only supports the member functions defined for the TermsEnum class, and next() is not one of them. Is this the case, or is there a way have it support all of the TVTermsEnum member functions, including next()? TVTermsEnum is a private class in CompressingTermVectorsReader.java. So I am wondering if there is a way to obtain term vectors in this way and that I am just not treating doc_1_tfv and tfv_iter in the right way, or if there is a different, better way to get term vectors for documents in a PyLucene index, or if this isn't something that Lucene should be used for. Thank you very much for any help you can provide. Mike
[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)
[ https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009465#comment-14009465 ] Jan Høydahl commented on SOLR-6113: --- It is actually by design, if you e.g. disallow all fielded search with {{uf=-*}} then eDismax will not interpret the x:y as field and value but as a valid query for literal x y. Some languages use : as separator, e.g. swedish will write {{FN:s}} (meaning UN's). The same approach is taken when some field names are disallowed. But I see that it can be confusing for people who indend to search a field, it would be better if Solr could give a feedback that fielded search is not allowed and that it falled back to literal matching. Edismax doesn't parse well the query uf (User Fields) - Key: SOLR-6113 URL: https://issues.apache.org/jira/browse/SOLR-6113 Project: Solr Issue Type: Bug Components: query parsers Reporter: Liram Vardi It seems that Edismax User Fields feature does not behave as expected. For instance, assuming the following query: _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_ The parsed query (taken from query debug info) is: _+((id:b* (text:user) (text:anna collins))~1)_ I expect that because user was filtered out in uf (User fields), the parsed query should not contain the user search part. In another words, the parsed query should look simply like this: _+id:b*_ This issue is affected by a the patch on issue SOLR-2649: When changing the default OP of Edismax to AND, the query results change. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5674) A new token filter: SubSequence
[ https://issues.apache.org/jira/browse/LUCENE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nitzan Shaked updated LUCENE-5674: -- Attachment: (was: subseqfilter.patch) A new token filter: SubSequence --- Key: LUCENE-5674 URL: https://issues.apache.org/jira/browse/LUCENE-5674 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nitzan Shaked Priority: Minor Original Estimate: 24h Remaining Estimate: 24h A new configurable token filter which, given a token breaks it into sub-parts and outputs consecutive sub-sequences of those sub-parts. Useful for, for example, using during indexing to generate variations on domain names, so that www.google.com can be found by searching for google.com, or www.google.com. Parameters: sepRegexp: A regular expression used split incoming tokens into sub-parts. glue: A string used to concatenate sub-parts together when creating sub-sequences. minLen: Minimum length (in sub-parts) of output sub-sequences maxLen: Maximum length (in sub-parts) of output sub-sequences (0 for unlimited; negative numbers for token length in sub-parts minus specified length) anchor: Anchor.START to output only prefixes, or Anchor.END to output only suffixes, or Anchor.NONE to output any sub-sequence withOriginal: whether to output also the original token EDIT: now includes tests for filter and for factory. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5674) A new token filter: SubSequence
[ https://issues.apache.org/jira/browse/LUCENE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nitzan Shaked updated LUCENE-5674: -- Attachment: subseqfilter.patch Updated patch, contains new format header for the one place that used the old format header A new token filter: SubSequence --- Key: LUCENE-5674 URL: https://issues.apache.org/jira/browse/LUCENE-5674 Project: Lucene - Core Issue Type: Improvement Components: core/other Reporter: Nitzan Shaked Priority: Minor Attachments: subseqfilter.patch Original Estimate: 24h Remaining Estimate: 24h A new configurable token filter which, given a token breaks it into sub-parts and outputs consecutive sub-sequences of those sub-parts. Useful for, for example, using during indexing to generate variations on domain names, so that www.google.com can be found by searching for google.com, or www.google.com. Parameters: sepRegexp: A regular expression used split incoming tokens into sub-parts. glue: A string used to concatenate sub-parts together when creating sub-sequences. minLen: Minimum length (in sub-parts) of output sub-sequences maxLen: Maximum length (in sub-parts) of output sub-sequences (0 for unlimited; negative numbers for token length in sub-parts minus specified length) anchor: Anchor.START to output only prefixes, or Anchor.END to output only suffixes, or Anchor.NONE to output any sub-sequence withOriginal: whether to output also the original token EDIT: now includes tests for filter and for factory. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6077) Create 5 minute tutorial
[ https://issues.apache.org/jira/browse/SOLR-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taka updated SOLR-6077: --- Attachment: 5minTutorial-v01.markdown Here's my first draft (v0.1) for the 5 minute tutorial (in markdown).Please post any feedback. For the future release of Solr, I would like to update or add better examples. Should I file this separately in JIRA? Create 5 minute tutorial Key: SOLR-6077 URL: https://issues.apache.org/jira/browse/SOLR-6077 Project: Solr Issue Type: Sub-task Reporter: Grant Ingersoll Attachments: 5minTutorial-v01.markdown Per the new site design for Solr, we'd like to have a 5 minutes to Solr tutorial that covers users getting their data in and querying it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6078) Create First Day Documentation
[ https://issues.apache.org/jira/browse/SOLR-6078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009508#comment-14009508 ] Taka commented on SOLR-6078: I've been trying to figure out the contents for the first day tutorial. It should include more than the current tutorial. https://lucene.apache.org/solr/4_8_1/tutorial.html But these wiki contents are more for the first week. http://wiki.apache.org/solr/ https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide I'm trying to identify appropriate list of topics for the first day doc. For example, should SolrCloud be included in this doc? Do you know good data sets (UCI ML Repo, Wikipedia, etc.) for this tutorial? I would appreciate any suggestions/ideas! Create First Day Documentation -- Key: SOLR-6078 URL: https://issues.apache.org/jira/browse/SOLR-6078 Project: Solr Issue Type: Sub-task Reporter: Grant Ingersoll As one progresses from getting started with Solr, it is important to show how their work will develop from simple acts with basic data sets, to more complex. This tutorial should highlight what a user is likely to need to know in their first day with Solr. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6077) Create 5 minute tutorial
[ https://issues.apache.org/jira/browse/SOLR-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009510#comment-14009510 ] Taka commented on SOLR-6077: I will include instructions for Windows later. v0.1 is only for unix-based systems. Create 5 minute tutorial Key: SOLR-6077 URL: https://issues.apache.org/jira/browse/SOLR-6077 Project: Solr Issue Type: Sub-task Reporter: Grant Ingersoll Attachments: 5minTutorial-v01.markdown Per the new site design for Solr, we'd like to have a 5 minutes to Solr tutorial that covers users getting their data in and querying it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 559 - Still Failing
java.lang.ArrayStoreException: unknown at __randomizedtesting.SeedInfo.seed([97ABA0C8320EBCCB:E5E785C7836E0AB8]:0) at org.apache.lucene.util.RamUsageEstimator$IdentityHashSet.add(RamUsageEstimator.java:674) Oh boy... what the hell is this? :) D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6077) Create 5 minute tutorial
[ https://issues.apache.org/jira/browse/SOLR-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009539#comment-14009539 ] Varun Thacker commented on SOLR-6077: - I think we should edit point 1 to be - Apache Solr runs of Java 7 or greater instead of Java 1.7 u55 or higher. I don't think we need to point out that it is recommended to use u55 for someone just wanting to try out Solr in 5 mins. Regarding Point 4 - Maybe we could spin this into another issue but currently The CSV example has 10 documents. The JSON example has 4 documents. The XML example has 32 documents. We should at least have the same documents in all the examples and have slightly more documents in each? Create 5 minute tutorial Key: SOLR-6077 URL: https://issues.apache.org/jira/browse/SOLR-6077 Project: Solr Issue Type: Sub-task Reporter: Grant Ingersoll Attachments: 5minTutorial-v01.markdown Per the new site design for Solr, we'd like to have a 5 minutes to Solr tutorial that covers users getting their data in and querying it. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5675) ID postings format
[ https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009545#comment-14009545 ] Michael McCandless commented on LUCENE-5675: Thanks Steve! ID postings format Key: LUCENE-5675 URL: https://issues.apache.org/jira/browse/LUCENE-5675 Project: Lucene - Core Issue Type: New Feature Affects Versions: 4.9, 5.0 Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5675.patch Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter. To some extend BlockTree can sometimes help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory. I don't think we are using everything we know: particularly the version semantics. Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version V in that segment very efficiently. Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit. As far as API, i think for users to provide IDs with versions to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a consumer of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5708) Remove IndexWriterConfig.clone
Michael McCandless created LUCENE-5708: -- Summary: Remove IndexWriterConfig.clone Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5708) Remove IndexWriterConfig.clone
[ https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5708: --- Attachment: LUCENE-5708.patch Initial patch, tests seem to pass. IWC already detects if it's illegally re-used across more than one IW. Remove IndexWriterConfig.clone -- Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5708.patch We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5708) Remove IndexWriterConfig.clone
[ https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5708: --- Attachment: LUCENE-5708.patch Woops, wrong patch ... this one should work. Remove IndexWriterConfig.clone -- Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5708.patch, LUCENE-5708.patch We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 559 - Still Failing
I think its the Chuck Norris object: and he can do this even without G1GC On Tue, May 27, 2014 at 11:09 AM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: java.lang.ArrayStoreException: unknown at __randomizedtesting.SeedInfo.seed([97ABA0C8320EBCCB:E5E785C7836E0AB8]:0) at org.apache.lucene.util.RamUsageEstimator$IdentityHashSet.add(RamUsageEstimator.java:674) Oh boy... what the hell is this? :) D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies
[ https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5973: - Attachment: SOLR-5973.patch A new patch. Solves issues that were found while working on SOLR-6088. Pluggable Ranking Collectors and Merge Strategies - Key: SOLR-5973 URL: https://issues.apache.org/jira/browse/SOLR-5973 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q={!customRank subquery=*:* param1=a param2=b}wt=jsonindent=true {code} In the sample above the param: {code}q={!customRank subquery=*:* param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for it's subquery. The RankQuery impl will have to do several things: 1) Implement the RankQuery interface. 2) Wrap the subquery and proxy all calls to the Query interface to the subquery. Using local params syntax the subquery can be any valid Solr query. The custom QParserPlugin is responsible for parsing the subquery. 3) Implement hashCode() and equals() so the queryResultCache works properly with subquery and custom ranking algorithm. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies
[ https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5973: - Description: This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. was: This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q={!customRank subquery=*:* param1=a param2=b}wt=jsonindent=true {code} In the sample above the param: {code}q={!customRank subquery=*:* param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for it's subquery. The RankQuery impl will have to do several things: 1) Implement the RankQuery interface. 2) Wrap the subquery and proxy all calls to the Query interface to the subquery. Using local params syntax the subquery can be any valid Solr query. The custom QParserPlugin is responsible for parsing the subquery. 3) Implement hashCode() and equals() so the queryResultCache works properly with subquery and custom ranking algorithm. Pluggable Ranking Collectors and Merge Strategies - Key: SOLR-5973 URL: https://issues.apache.org/jira/browse/SOLR-5973 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies
[ https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5973: - Description: This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the new rq (rank query) param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. was: This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. Pluggable Ranking Collectors and Merge Strategies - Key: SOLR-5973 URL: https://issues.apache.org/jira/browse/SOLR-5973 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the new rq (rank query) param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies
[ https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009648#comment-14009648 ] ASF subversion and git services commented on SOLR-5973: --- Commit 1597775 from [~joel.bernstein] in branch 'dev/trunk' [ https://svn.apache.org/r1597775 ] SOLR-5973: Pluggable Ranking Collectors and Merge Strategies Pluggable Ranking Collectors and Merge Strategies - Key: SOLR-5973 URL: https://issues.apache.org/jira/browse/SOLR-5973 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the new rq (rank query) param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)
[ https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009681#comment-14009681 ] Eyal Zaidman commented on SOLR-6113: I'm a little confused by that behavior, assuming I understand the technical details. The real world scenario uf is trying to address is disallowing or restricting search in some fields, so for example if I wanted to implement a permissions scheme, I could tell it -restrictedField and it would not search there. By treating that search as a literal (presumably because we can't detect whether the user meant a fielded search or a Swedish term, exactly matching a SOLR field name) we're preferring the less common to rather esoteric (IMO) scenario. Adding to that Liram's comment about the relation to SOLR-2649, the default operator behavior could make this even worse, where instead of OR you get an AND behavior, and all searches fail due to forcing a non-existent literal match. Do you think it would make sense to add functionality that removes that part of the search query instead of escaping it ? We could of course make some flag for preserving the old behavior in case someone finds it useful. Could you point us in the right direction to do it if so ? We'd be happy to attempt a patch Edismax doesn't parse well the query uf (User Fields) - Key: SOLR-6113 URL: https://issues.apache.org/jira/browse/SOLR-6113 Project: Solr Issue Type: Bug Components: query parsers Reporter: Liram Vardi It seems that Edismax User Fields feature does not behave as expected. For instance, assuming the following query: _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_ The parsed query (taken from query debug info) is: _+((id:b* (text:user) (text:anna collins))~1)_ I expect that because user was filtered out in uf (User fields), the parsed query should not contain the user search part. In another words, the parsed query should look simply like this: _+id:b*_ This issue is affected by a the patch on issue SOLR-2649: When changing the default OP of Edismax to AND, the query results change. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies
[ https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009697#comment-14009697 ] ASF subversion and git services commented on SOLR-5973: --- Commit 1597796 from [~joel.bernstein] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1597796 ] SOLR-5973: Pluggable Ranking Collectors and Merge Strategies Pluggable Ranking Collectors and Merge Strategies - Key: SOLR-5973 URL: https://issues.apache.org/jira/browse/SOLR-5973 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the new rq (rank query) param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone
[ https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009722#comment-14009722 ] Shai Erera commented on LUCENE-5708: I think the way you fixed some tests that used clone is incorrect. You should at least call {{newIndexWriterConfig(random)}} w/ the same random and seed, so the exact IWC is created each time. At least, that's what these tests now rely on, even if they don't break. Otherwise, they just create a random IWC each time they open a writer, which is not the intention I believe. Remove IndexWriterConfig.clone -- Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5708.patch, LUCENE-5708.patch We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)
[ https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009733#comment-14009733 ] Jack Krupansky commented on SOLR-6113: -- Better doc for the intended behavior would help, at least a little. At least we could point people to a clear description of what actually happens. Edismax doesn't parse well the query uf (User Fields) - Key: SOLR-6113 URL: https://issues.apache.org/jira/browse/SOLR-6113 Project: Solr Issue Type: Bug Components: query parsers Reporter: Liram Vardi It seems that Edismax User Fields feature does not behave as expected. For instance, assuming the following query: _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_ The parsed query (taken from query debug info) is: _+((id:b* (text:user) (text:anna collins))~1)_ I expect that because user was filtered out in uf (User fields), the parsed query should not contain the user search part. In another words, the parsed query should look simply like this: _+id:b*_ This issue is affected by a the patch on issue SOLR-2649: When changing the default OP of Edismax to AND, the query results change. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone
[ https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009824#comment-14009824 ] Michael McCandless commented on LUCENE-5708: Hmm which tests rely on using the same IWC? I thought I was improving the tests by switching up the config... Remove IndexWriterConfig.clone -- Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5708.patch, LUCENE-5708.patch We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone
[ https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009930#comment-14009930 ] Adrien Grand commented on LUCENE-5708: -- I don't know about these tests that expect the same config but I'm +1 in general to remove all that cloning. It looks to me that we should be able to make some fields final now that we don't have a clone method anymore (eg. MergePolicy.writer)? Remove IndexWriterConfig.clone -- Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5708.patch, LUCENE-5708.patch We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6091) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations
[ https://issues.apache.org/jira/browse/SOLR-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-6091: Assignee: Noble Paul (was: Shalin Shekhar Mangar) Race condition in prioritizeOverseerNodes can trigger extra QUIT operations --- Key: SOLR-6091 URL: https://issues.apache.org/jira/browse/SOLR-6091 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7, 4.8 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: 4.9, 5.0 Attachments: SOLR-6091.patch When using the overseer roles feature, there is a possibility of more than one thread executing the prioritizeOverseerNodes method and extra QUIT commands being inserted into the overseer queue. At a minimum, the prioritizeOverseerNodes should be synchronized to avoid a race condition. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-6095) SolrCloud cluster can end up without an overseer
[ https://issues.apache.org/jira/browse/SOLR-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul reassigned SOLR-6095: Assignee: Noble Paul SolrCloud cluster can end up without an overseer Key: SOLR-6095 URL: https://issues.apache.org/jira/browse/SOLR-6095 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.8 Reporter: Shalin Shekhar Mangar Assignee: Noble Paul Fix For: 4.9, 5.0 We have a large cluster running on ec2 which occasionally ends up without an overseer after a rolling restart. We always restart our overseer nodes at the very last otherwise we end up with a large number of shards that can't recover properly. This cluster is running a custom branch forked from 4.8 and has SOLR-5473, SOLR-5495 and SOLR-5468 applied. We have a large number of small collections (120 collections each with approx 5M docs) on 16 Solr nodes. We are also using the overseer roles feature to designate two specified nodes as overseers. However, I think the problem that we're seeing is not specific to the overseer roles feature. As soon as the overseer was shutdown, we saw the following on the node which was next in line to become the overseer: {code} 2014-05-20 09:55:39,261 [main-EventThread] INFO solr.cloud.ElectionContext - I am going to be the leader ec2-xx.compute-1.amazonaws.com:8987_solr 2014-05-20 09:55:39,265 [main-EventThread] WARN solr.cloud.LeaderElector - org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /overseer_elect/leader at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373) at org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110) at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55) at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) {code} When the overseer leader node is gracefully shutdown, we get the following in the logs: {code} 2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer - Exception in Overseer main queue loop org.apache.solr.common.SolrException: Could not load collection from ZK:sm12 at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778) at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553) at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246) at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:237) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226) at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223) at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:767) ... 4 more 2014-05-20 09:55:39,254 [Thread-63] INFO solr.cloud.Overseer - Overseer Loop exiting : ec2-xx.compute-1.amazonaws.com:8986_solr 2014-05-20 09:55:39,256 [main-EventThread] WARN common.cloud.ZkStateReader - ZooKeeper watch triggered, but Solr cannot talk to ZK 2014-05-20 09:55:39,259 [ShutdownMonitor] INFO server.handler.ContextHandler - stopped
[jira] [Commented] (LUCENE-5708) Remove IndexWriterConfig.clone
[ https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009988#comment-14009988 ] Shai Erera commented on LUCENE-5708: bq. Hmm which tests rely on using the same IWC? Hmm ... I don't remember. All I remember is that while I worked on preventing sharing IWC between writers (LUCENE-4876), there were a bunch of tests that reused the IWC. I fixed them by simply cloning it, but I admit I didn't check if initializing a new IWC each time serves their purpose. I just assume that if so many tests did that, there ought to be a reason beyond just convenience, but I could be wrong. What I'm worried is that by not cloning Jenkins will trip (which is good!), or worse - that those tests will stop asserting what they asserted before. So I just wanted to point that out. If we're ready to take the risk, I'm fine with it, because eventually we're discussing tests here .. there's nothing functionally missing from an app's perspective. Remove IndexWriterConfig.clone -- Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5708.patch, LUCENE-5708.patch We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5708) Remove IndexWriterConfig.clone
[ https://issues.apache.org/jira/browse/LUCENE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009988#comment-14009988 ] Shai Erera edited comment on LUCENE-5708 at 5/27/14 5:56 PM: - bq. Hmm which tests rely on using the same IWC? Hmm ... I don't remember. All I remember is that while I worked on preventing sharing IWC between writers (LUCENE-4876), there were a bunch of tests that reused the IWC. I fixed them by simply cloning it, but I admit I didn't check if initializing a new IWC each time serves their purpose. I just assume that if so many tests did that, there ought to be a reason beyond just convenience, but I could be wrong. What I'm worried is that by not cloning Jenkins will trip (which is good!), or worse - that those tests will stop asserting what they asserted before. So I just wanted to point that out. If we're ready to take the risk, I'm fine with it, because eventually we're discussing tests here .. there's nothing functionally impossible from an app's perspective. was (Author: shaie): bq. Hmm which tests rely on using the same IWC? Hmm ... I don't remember. All I remember is that while I worked on preventing sharing IWC between writers (LUCENE-4876), there were a bunch of tests that reused the IWC. I fixed them by simply cloning it, but I admit I didn't check if initializing a new IWC each time serves their purpose. I just assume that if so many tests did that, there ought to be a reason beyond just convenience, but I could be wrong. What I'm worried is that by not cloning Jenkins will trip (which is good!), or worse - that those tests will stop asserting what they asserted before. So I just wanted to point that out. If we're ready to take the risk, I'm fine with it, because eventually we're discussing tests here .. there's nothing functionally missing from an app's perspective. Remove IndexWriterConfig.clone -- Key: LUCENE-5708 URL: https://issues.apache.org/jira/browse/LUCENE-5708 Project: Lucene - Core Issue Type: Bug Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.9, 5.0 Attachments: LUCENE-5708.patch, LUCENE-5708.patch We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today. I think we should just disallow sharing: you must make a new IWC for a new IndexWriter. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6114) cron for elgg_solr reindex
Jagdish Bairagi created SOLR-6114: - Summary: cron for elgg_solr reindex Key: SOLR-6114 URL: https://issues.apache.org/jira/browse/SOLR-6114 Project: Solr Issue Type: Bug Components: clients - php Affects Versions: 4.8.1 Environment: Elgg+elgg_solr plugin, Nginx with Centos Reporter: Jagdish Bairagi Fix For: 4.8.1 Hi, we have setup solr on our server and added a elgg_solr plugin to our elgg site...and plugin is working fine with locally installed solr..and indexing manually but i need a cron for indexing using elgg_solr plugin... as i followed the doc.:- but can't see which cron need to be run..where is the cron. https://github.com/arckinteractive/elgg_solr Installation Install to elgg mod directory as 'elgg_solr'. Enable the plugin in the Admin page, move to position below the search plugin. Configure Solr with the schema.xml included in the root directory of this plugin. Enter and save the connection information on the plugin settings page. Trigger a reindex from the plugin setting page. Ensure hourly cron is configured and active So please help me..i am looking for cron to setup..for autoreindex. what should be the cron... Thank, Jagdish -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-6114) cron for elgg_solr reindex
[ https://issues.apache.org/jira/browse/SOLR-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey closed SOLR-6114. -- Resolution: Invalid Fix Version/s: (was: 4.8.1) The elgg software package is not part of Solr. This is a bug/feature tracker for Solr and other Apache projects. Nobody here will have the required knowledge to help you. You'll need to send a request to a support resource for elgg or their solr plugin. It looks like you might be able to do this at the following location, after creating an account for their website: http://community.elgg.org/groups/all?filter=support Another possibility would be to open an issue against the solr plugin project asking them to include the required crontab entry in their README, which you could do at the following URL. You probably need a github account to do this: https://github.com/arckinteractive/elgg_solr/issues cron for elgg_solr reindex -- Key: SOLR-6114 URL: https://issues.apache.org/jira/browse/SOLR-6114 Project: Solr Issue Type: Bug Components: clients - php Affects Versions: 4.8.1 Environment: Elgg+elgg_solr plugin, Nginx with Centos Reporter: Jagdish Bairagi Hi, we have setup solr on our server and added a elgg_solr plugin to our elgg site...and plugin is working fine with locally installed solr..and indexing manually but i need a cron for indexing using elgg_solr plugin... as i followed the doc.:- but can't see which cron need to be run..where is the cron. https://github.com/arckinteractive/elgg_solr Installation Install to elgg mod directory as 'elgg_solr'. Enable the plugin in the Admin page, move to position below the search plugin. Configure Solr with the schema.xml included in the root directory of this plugin. Enter and save the connection information on the plugin settings page. Trigger a reindex from the plugin setting page. Ensure hourly cron is configured and active So please help me..i am looking for cron to setup..for autoreindex. what should be the cron... Thank, Jagdish -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6062) Phrase queries are created for each field supplied through edismax's pf, pf2 and pf3 parameters (rather them being combined in a single dismax query)
[ https://issues.apache.org/jira/browse/SOLR-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010368#comment-14010368 ] Michael Dodsworth commented on SOLR-6062: - adding [~jdyer] [~janhoy], as you were involved in https://issues.apache.org/jira/browse/SOLR-2058 Phrase queries are created for each field supplied through edismax's pf, pf2 and pf3 parameters (rather them being combined in a single dismax query) - Key: SOLR-6062 URL: https://issues.apache.org/jira/browse/SOLR-6062 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0 Reporter: Michael Dodsworth Priority: Minor Attachments: combined-phrased-dismax.patch https://issues.apache.org/jira/browse/SOLR-2058 subtly changed how phrase queries, created through the pf, pf2 and pf3 parameters, are merged into the main user query. For the query: 'term1 term2' with pf2:[field1, field2, field3] we now get (omitting the non phrase query section for clarity): {code:java} main query DisjunctionMaxQuery((field1:term1 term2^1.0)~0.1) DisjunctionMaxQuery((field2:term1 term2^1.0)~0.1) DisjunctionMaxQuery((field3:term1 term2^1.0)~0.1) {code} Prior to this change, we had: {code:java} main query DisjunctionMaxQuery((field1:term1 term2^1.0 | field2:term1 term2^1.0 | field3:term1 term2^1.0)~0.1) {code} The upshot being that if the phrase query term1 term2 appears in multiple fields, it will get a significant boost over the previous implementation. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)
[ https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010443#comment-14010443 ] Jan Høydahl commented on SOLR-6113: --- Suggestion for improvements welcome. In my opinion, simply discarding the whole term is also very confusing. Imagine {{uf=-titleq=title:java language}} - as we're now you will require the three words title, java, language. If we remove the first term we'll match all docs with language, clearly not the intention. A perhaps better solution could be to search literally for the string title:java instead of breaking it into two? Alternatively, disregard the field part, but let the value part stay and be subject to qf or df. However, that could be confusing too, if the user won't get any feedback whatsoever that his query term java was actually not restricted to title only. Edismax doesn't parse well the query uf (User Fields) - Key: SOLR-6113 URL: https://issues.apache.org/jira/browse/SOLR-6113 Project: Solr Issue Type: Bug Components: query parsers Reporter: Liram Vardi Labels: edismax It seems that Edismax User Fields feature does not behave as expected. For instance, assuming the following query: _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_ The parsed query (taken from query debug info) is: _+((id:b* (text:user) (text:anna collins))~1)_ I expect that because user was filtered out in uf (User fields), the parsed query should not contain the user search part. In another words, the parsed query should look simply like this: _+id:b*_ This issue is affected by a the patch on issue SOLR-2649: When changing the default OP of Edismax to AND, the query results change. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)
[ https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-6113: -- Issue Type: Improvement (was: Bug) Edismax doesn't parse well the query uf (User Fields) - Key: SOLR-6113 URL: https://issues.apache.org/jira/browse/SOLR-6113 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Liram Vardi Labels: edismax It seems that Edismax User Fields feature does not behave as expected. For instance, assuming the following query: _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_ The parsed query (taken from query debug info) is: _+((id:b* (text:user) (text:anna collins))~1)_ I expect that because user was filtered out in uf (User fields), the parsed query should not contain the user search part. In another words, the parsed query should look simply like this: _+id:b*_ This issue is affected by a the patch on issue SOLR-2649: When changing the default OP of Edismax to AND, the query results change. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)
[ https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-6113: -- Labels: edismax (was: ) Edismax doesn't parse well the query uf (User Fields) - Key: SOLR-6113 URL: https://issues.apache.org/jira/browse/SOLR-6113 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Liram Vardi Priority: Minor Labels: edismax It seems that Edismax User Fields feature does not behave as expected. For instance, assuming the following query: _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_ The parsed query (taken from query debug info) is: _+((id:b* (text:user) (text:anna collins))~1)_ I expect that because user was filtered out in uf (User fields), the parsed query should not contain the user search part. In another words, the parsed query should look simply like this: _+id:b*_ This issue is affected by a the patch on issue SOLR-2649: When changing the default OP of Edismax to AND, the query results change. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)
[ https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-6113: -- Priority: Minor (was: Major) Edismax doesn't parse well the query uf (User Fields) - Key: SOLR-6113 URL: https://issues.apache.org/jira/browse/SOLR-6113 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Liram Vardi Priority: Minor Labels: edismax It seems that Edismax User Fields feature does not behave as expected. For instance, assuming the following query: _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_ The parsed query (taken from query debug info) is: _+((id:b* (text:user) (text:anna collins))~1)_ I expect that because user was filtered out in uf (User fields), the parsed query should not contain the user search part. In another words, the parsed query should look simply like this: _+id:b*_ This issue is affected by a the patch on issue SOLR-2649: When changing the default OP of Edismax to AND, the query results change. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Schemaless mode and solrconfig.xml
How do all the components that define a text field for various purposes play nice in schemaless mode? The df entry is commented out of the /select handler which at least lets us get started. But there are a variety of places where text is specified, the /browse handler, spell checking, etc. What ideas exist or are already under way for making it easier to make Solr run OOB in schemaless with these bells and whistles? Or will it just be required that people edit solrconfig.xml to reflect fields that actually get created via schemaless? Or is the best solution just to take everything out of the solrconfig.xml file that appears in the example-schemaless configuration except, say the select handler? Or pre-define a text field in the schemaless schema.xml file to handle these? I'm not even sure we _could_ make things like the spell checker play nice with schemaless. For the other handlers we can specify df= on the URL once we know there's a field available, but that's tricky for spell checking... And I think browse is just totally out of the question since it's coded up for the default schema. Pull that too? Thoughts? Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Schemaless mode and solrconfig.xml
Long term, it seems like the solution is to have a SolrConfig API that is powerful enough to define things like df at runtime. Definitely worth discussing what we should do until then, though. Or pre-define a text field in the schemaless schema.xml file to handle these? I'd lean towards not defining any dfs in the schemaless solrconfigs over this solution. I'd imagine if we define text in the schemaless schema.xml, a user would do the following: - index some data - query and get no results and be confused Alternatively, if we don't define any dfs they pattern would be: - index some data - query and get error about field not specified or df not defined from there the user would at least have a starting point and decide to specify the df in the query, specify the field, or edit the solrconfig.xml and restart the service. That seems like it would result in less unexpected behavior and fewer user questions. Note, this means we should probably get rid of the df for /query in the schemaless example. I'm not even sure we _could_ make things like the spell checker play nice with schemaless. For the other handlers we can specify df= on the URL once we know there's a field available, but that's tricky for spell checking... And I think browse is just totally out of the question since it's coded up for the default schema. Pull that too? +1 on pulling both of those. Thoughts? Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies
[ https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010695#comment-14010695 ] ASF subversion and git services commented on SOLR-5973: --- Commit 1597921 from [~joel.bernstein] in branch 'dev/trunk' [ https://svn.apache.org/r1597921 ] SOLR-5973: Pluggable Ranking Collectors and Merge Strategies Pluggable Ranking Collectors and Merge Strategies - Key: SOLR-5973 URL: https://issues.apache.org/jira/browse/SOLR-5973 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the new rq (rank query) param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5973) Pluggable Ranking Collectors and Merge Strategies
[ https://issues.apache.org/jira/browse/SOLR-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010733#comment-14010733 ] ASF subversion and git services commented on SOLR-5973: --- Commit 1597923 from [~joel.bernstein] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1597923 ] SOLR-5973: Pluggable Ranking Collectors and Merge Strategies Pluggable Ranking Collectors and Merge Strategies - Key: SOLR-5973 URL: https://issues.apache.org/jira/browse/SOLR-5973 Project: Solr Issue Type: New Feature Components: search Reporter: Joel Bernstein Assignee: Joel Bernstein Priority: Minor Fix For: 4.9 Attachments: SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch, SOLR-5973.patch This ticket introduces a new RankQuery and MergeStrategy to Solr. By extending the RankQuery class, and implementing it's interface, you can specify a custom ranking collector (TopDocsCollector) and distributed merge strategy for a Solr query. Sample syntax: {code} q=hellorq={!customRank param1=a param2=b}wt=jsonindent=true {code} In the sample above the new rq (rank query) param: {code}rq={!customRank param1=a param2=b}{code} points to a QParserPlugin that returns a Query that extends RankQuery. The RankQuery defines the custom ranking and merge strategy for the main query. The RankQuery impl will have to do several things: 1) Implement the getTopDocsCollector() method to return a custom top docs ranking collector. 2) Implement the wrap() method. The QueryComponent calls the wrap() method to wrap the RankQuery around the main query. This design allows the RankQuery to manage Query caching issues and implement custom Query explanations if needed. 3) Implement hashCode() and equals() so the queryResultCache works properly with main query and custom ranking algorithm. 4) Optionally implement a custom MergeStrategy to handle the merging of distributed results from the shards. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org