[VOTE] Release PyLucene 4.3.0-1
It looks like the time has finally come for a PyLucene 4.x release ! The PyLucene 4.3.0-1 release tracking the recent release of Apache Lucene 4.3.0 is ready. A release candidate is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_4_3/CHANGES PyLucene 4.3.0 is built with JCC 2.16 included in these release artifacts: http://svn.apache.org/repos/asf/lucene/pylucene/trunk/jcc/CHANGES A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 4.3.0-1. Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
[jira] [Updated] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.
[ https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated LUCENE-949: -- Attachment: LUCENE-949.patch Hi [~talli...@mitre.org], Sorry it took so long, I've attached a patch based on your patch with some fixes: * Removed tabs. * Restored license header and class javadoc to {{AnalyzingQueryParser.java}} (your patch removed them for some reason?). * Converted all code indentation to 2 spaces per level (you had a lot of 3 space per level indentation). * Converted the {{wildcardPattern}} to allow anything to be escaped, not just backslashes and wildcard chars '?' and '*'. Also removed the optional backslashes from group 2 (the actual wildcards) - when iterating over wildcardPattern matches, your patch would throw away any number of real wildcards following an escaped wildcard. I added a test for this. * When multiple output tokens are produced (and there should only be one), now reporting all of them in the exception message instead of just the first two. * Removed all references to chunklet in favor of output token - this non-standard terminology made the code harder to read. * Changed descriptions of multiple output tokens to not necessarily be as the result of splitting (e.g. synonyms). * In {{analyzeSingleChunk()}}, moved exception throwing to the source of problems. I also added a {{CHANGES.txt}} entry. Tim, let me know if you think my changes are okay - if so, I think it's ready to commit. AnalyzingQueryParser can't work with leading wildcards. --- Key: LUCENE-949 URL: https://issues.apache.org/jira/browse/LUCENE-949 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 2.2 Reporter: Stefan Klein Attachments: LUCENE-949.patch, LUCENE-949.patch, LUCENE-949.patch The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following changes to accept leading wildcards: protected Query getWildcardQuery(String field, String termStr) throws ParseException { String useTermStr = termStr; String leadingWildcard = null; if (*.equals(field)) { if (*.equals(useTermStr)) return new MatchAllDocsQuery(); } boolean hasLeadingWildcard = (useTermStr.startsWith(*) || useTermStr.startsWith(?)) ? true : false; if (!getAllowLeadingWildcard() hasLeadingWildcard) throw new ParseException('*' or '?' not allowed as first character in WildcardQuery); if (getLowercaseExpandedTerms()) { useTermStr = useTermStr.toLowerCase(); } if (hasLeadingWildcard) { leadingWildcard = useTermStr.substring(0, 1); useTermStr = useTermStr.substring(1); } List tlist = new ArrayList(); List wlist = new ArrayList(); /* * somewhat a hack: find/store wildcard chars in order to put them back * after analyzing */ boolean isWithinToken = (!useTermStr.startsWith(?) !useTermStr.startsWith(*)); isWithinToken = true; StringBuffer tmpBuffer = new StringBuffer(); char[] chars = useTermStr.toCharArray(); for (int i = 0; i useTermStr.length(); i++) { if (chars[i] == '?' || chars[i] == '*') { if (isWithinToken) { tlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = false; } else { if (!isWithinToken) { wlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = true; } tmpBuffer.append(chars[i]); } if (isWithinToken) { tlist.add(tmpBuffer.toString()); } else { wlist.add(tmpBuffer.toString()); } // get Analyzer from superclass and tokenize the term TokenStream source = getAnalyzer().tokenStream(field, new
[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.
[ https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649560#comment-13649560 ] Steve Rowe commented on LUCENE-949: --- One other change I forgot to mention, Tim: I substituted MockAnalyzer where you used StandardAnalyzer in the test code - this allowed me to remove the analyzers-common dependency you introduced (and also the memory dependency, which didn't seem to be used for anything in your patch). AnalyzingQueryParser can't work with leading wildcards. --- Key: LUCENE-949 URL: https://issues.apache.org/jira/browse/LUCENE-949 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 2.2 Reporter: Stefan Klein Attachments: LUCENE-949.patch, LUCENE-949.patch, LUCENE-949.patch The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following changes to accept leading wildcards: protected Query getWildcardQuery(String field, String termStr) throws ParseException { String useTermStr = termStr; String leadingWildcard = null; if (*.equals(field)) { if (*.equals(useTermStr)) return new MatchAllDocsQuery(); } boolean hasLeadingWildcard = (useTermStr.startsWith(*) || useTermStr.startsWith(?)) ? true : false; if (!getAllowLeadingWildcard() hasLeadingWildcard) throw new ParseException('*' or '?' not allowed as first character in WildcardQuery); if (getLowercaseExpandedTerms()) { useTermStr = useTermStr.toLowerCase(); } if (hasLeadingWildcard) { leadingWildcard = useTermStr.substring(0, 1); useTermStr = useTermStr.substring(1); } List tlist = new ArrayList(); List wlist = new ArrayList(); /* * somewhat a hack: find/store wildcard chars in order to put them back * after analyzing */ boolean isWithinToken = (!useTermStr.startsWith(?) !useTermStr.startsWith(*)); isWithinToken = true; StringBuffer tmpBuffer = new StringBuffer(); char[] chars = useTermStr.toCharArray(); for (int i = 0; i useTermStr.length(); i++) { if (chars[i] == '?' || chars[i] == '*') { if (isWithinToken) { tlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = false; } else { if (!isWithinToken) { wlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = true; } tmpBuffer.append(chars[i]); } if (isWithinToken) { tlist.add(tmpBuffer.toString()); } else { wlist.add(tmpBuffer.toString()); } // get Analyzer from superclass and tokenize the term TokenStream source = getAnalyzer().tokenStream(field, new StringReader(useTermStr)); org.apache.lucene.analysis.Token t; int countTokens = 0; while (true) { try { t = source.next(); } catch (IOException e) { t = null; } if (t == null) { break; } if (!.equals(t.termText())) { try { tlist.set(countTokens++, t.termText()); } catch (IndexOutOfBoundsException ioobe) { countTokens = -1; } }
[jira] [Created] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
chakming wong created SOLR-4788: --- Summary: Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig in above setup dataimporter.entity1.last_index_time is empty string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code} ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} in above setup dataimporter.entity1.last_index_time is empty string was: ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig in above setup dataimporter.entity1.last_index_time is empty string Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty - Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code} ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} in above setup dataimporter.entity1.last_index_time is empty string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} in above setup dataimporter.entity1.last_index_time is empty string was: {code} ?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} in above setup dataimporter.entity1.last_index_time is empty string Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty - Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} in above setup dataimporter.entity1.last_index_time is empty string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* was: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} in above setup dataimporter.entity1.last_index_time is empty string Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty - Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* was: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... / /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty - Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* was: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty - Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* was: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty - Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Summary: Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty (was: Multiple Entities DIH: dataimporter.[entityName].last_index_time is empty) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty -- Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* was: {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty -- Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* was: {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty -- Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdihconfig.xml/str /lst /requestHandler ... {code} {code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* was: {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty -- Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdihconfig.xml/str /lst /requestHandler ... {code} {code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty
[jira] [Updated] (SOLR-4788) Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty
[ https://issues.apache.org/jira/browse/SOLR-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chakming wong updated SOLR-4788: Description: {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdihconfig.xml/str /lst /requestHandler ... {code} {code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* and cause the sql query having error was: {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdihconfig.xml/str /lst /requestHandler ... {code} {code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field ... ... ... /field /entity entity name=entity2 ... ... /entity entity name=entity3 ... ... /entity /document /dataConfig {code} In above setup, *dataimporter.entity1.last_index_time* is *empty string* Multiple Entities DIH delta import: dataimporter.[entityName].last_index_time is empty -- Key: SOLR-4788 URL: https://issues.apache.org/jira/browse/SOLR-4788 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: chakming wong {code:title=conf/dataimport.properties|borderStyle=solid}entity1.last_index_time=2013-05-06 03\:02\:06 last_index_time=2013-05-06 03\:05\:22 entity2.last_index_time=2013-05-06 03\:03\:14 entity3.last_index_time=2013-05-06 03\:05\:22 {code} {code:title=conf/solrconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdihconfig.xml/str /lst /requestHandler ... {code} {code:title=conf/dihconfig.xml|borderStyle=solid}?xml version=1.0 encoding=UTF-8 ? dataConfig dataSource name=source1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://*:*/* user=* password=*/ document name=strings entity name=entity1 pk=id dataSource=source1 query=SELECT * FROM table_a deltaQuery=SELECT table_a_id FROM table_b WHERE last_modified '${dataimporter.entity1.last_index_time}' deltaImportQuery=SELECT * FROM table_a WHERE id = '${dataimporter.entity1.id}' transformer=TemplateTransformer field
[jira] [Created] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
Shai Erera created LUCENE-4982: -- Summary: Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Shai Erera Assignee: Shai Erera While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4982: --- Component/s: (was: general/test) modules/test-framework Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4982: --- Attachment: LUCENE-4982.patch Patch adds a test to TestMockDirWrapper and factors out checkDiskFull method in MockIOWrapper. The signature is a bit ugly, but that's needed because checkDiskFull copies the remaining bytes, and writeBytes copies from an array while copyBytes from DataInput. I don't think it's the end of the world, but if anyone has an idea how to do it better... I ran core tests and they passed (actually only 3 tests under core set dir.maxSize). Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4982.patch While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent
[ https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649604#comment-13649604 ] Alexander Buhr commented on SOLR-3177: -- is this going to be released at some point? Excluding tagged filter in StatsComponent - Key: SOLR-3177 URL: https://issues.apache.org/jira/browse/SOLR-3177 Project: Solr Issue Type: Improvement Components: SearchComponents - other Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1 Reporter: Mathias H. Priority: Minor Labels: localparams, stats, statscomponent Attachments: SOLR-3177.patch It would be useful to exclude the effects of some fq params from the set of documents used to compute stats -- similar to how you can exclude tagged filters when generating facet counts... https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters So that it's possible to do something like this... http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 20]q=*:*stats=truestats.field={!ex=priceFilter}price If you want to create a price slider this is very useful because then you can filter the price ([1 TO 20) and nevertheless get the lower and upper bound of the unfiltered price (min=0, max=100): {noformat} |-[---]--| $0 $1 $20$100 {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649613#comment-13649613 ] Adrien Grand commented on LUCENE-4975: -- +1 to commit too. Looking at the code, there seems to be specialized implementations for faceting because of the need to replicate the taxonomy indexes too, so I was wondering that maybe this facet-specific code should be under lucene/facets rather than lucene/replicator so that lucene/replicator doesn't need to depend on all modules that have specific replication needs. (I'm not sure what the best option is yet, this can be addressed afterwards.) Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649631#comment-13649631 ] Shai Erera commented on LUCENE-4975: I've been wondering about that too, but chose to keep the facet replication code under replicator for few reasons: * A Revision contains files from multiple sources, and the taxonomy index is partly responsible for that. And ReplicationClient respects that -- so I guess it's not entirely true that the Replicator is unaware of taxonomy (even though it would still work if I pulled the taxonomy stuff out of it). * I think it makes less sense to require lucene-replicator.jar for every faceted search app which makes use of lucene-facet.jar. The key reason is that replicator requires few additional jars such as httpclient, httpcore, jetty, servlet-api. Requiring lucene-facet.jar seems less painful to me, than requiring every faceted search app out there to include all these jars even if it doesn't want to do replication. * I like to keep things local to the module. There are many similarities between IndexAndTaxoRevision to IndexRevision (likewise for their handlers and tests). Therefore whenever I made change to one, I knew I should go make a similar change to the other. All in all, I guess arguments can be made both ways, but I prefer for the now to keep things local to the replicator module. Even in the future, I would imagine that if we added support for replicating a suggester files, then it would make sense to put a dependency between replicator and suggester, rather than the other way around. Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649633#comment-13649633 ] SooMyung Lee commented on LUCENE-4956: -- [~cm] I'm sorry that I didn't reply to your comment on the last weekend! I'm seeing that [~steve_rowe] solved your problem. am I right? [~steve_rowe] I checked the method. isNounPart() is no more necessary. Spaces should be inserted between phrases in a korean sentence, but many people are confused in where inserting spaces. The isNounPart() method examine if spaces should be inserted at a specific position only when a noun existing in the dictionary precede it. After testing, I found that the method is superfluous. I'm sorry not to correct the source code before contributing. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4975: --- Attachment: LUCENE-4975.patch bq. maybe also call MDW.setRandomIOExceptionRateOnOpen Thanks Mike! I added that and a slew of problems surfaced, most of them in the test, but I improved the handlers' implementation to cleanup after themselves if e.g. a copy or sync to the handlerDir failed. While this wasn't a bug, it leaves the target index directory clean. There's one nocommit which bugs me though -- I had to add dir.setPreventDoubleWrite(false) because when the handler fails during copying of say _2.fdt to the index dir, the file is deleted from the indexDir, and the client re-attemts to upgrade. At this point, MDW complains that _2.fdt was already written to, even though I deleted it. Adding this setPrevent was the only way I could make MDW happy, but I don't like it since I do want to catch errors in the handler/client if they e.g. attempt to copy over an existing file. Maybe we can make MDW respond somehow to delete()? I know that has bad implications on its own, e.g. code which deletes and then accidentally recreates files with older names ... any ideas? Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649635#comment-13649635 ] Adrien Grand commented on LUCENE-4975: -- Good points, you convinced me. :-) Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4980) Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest
[ https://issues.apache.org/jira/browse/LUCENE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649638#comment-13649638 ] Shai Erera commented on LUCENE-4980: I was confused by the name MultiFacetsAccumulator as I thought it takes something like a MapFacetRequest,FacetsAccumulator, but I see that it only distinguishes RangeAccumulator from others. So I'm worried about someone gets confused about the name and use it incorrectly. I don't have a better name in mind though ... RangeAndRegularFacetsAccumulator? What if RangeAccumulator did that under the covers? I.e. instead of rejecting non-RangeFacetRequest, it created FA over all such requests? Multi is quite simple though, so I like it .. maybe FacetAccumulatorRangeWrapper? I think as long as we keep the word Range in the name, it's less likely users will get confused. Minor comments about the class: (a) can you rename 'a' and 'ra'? (b) why do you need to hold onto fspOrig? Is it because FA.searchParams isn't final? Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest - Key: LUCENE-4980 URL: https://issues.apache.org/jira/browse/LUCENE-4980 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-4980.patch I tried to combine these two and there were several issues: * It's ... really tricky to manage the two different FacetAccumulators across that N FacetCollectors that DrillSideways creates ... to fix this I added a new MultiFacetsAccumulator that switches for you. * There was still one place in DS/DDQ that wasn't properly handling a non-Term drill-down. * There was a bug in the collector method for DrillSideways whereby if a given segment had no hits, it was skipped, which is incorrect because it must still be visited to tally up the sideways counts. * Separately I noticed that DrillSideways was doing too much work: it would count up drill-down counts *and* drill-sideways counts against the same dim (but then discard the drill-down counts in the end). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649643#comment-13649643 ] Robert Muir commented on LUCENE-4975: - {quote} Even in the future, I would imagine that if we added support for replicating a suggester files, then it would make sense to put a dependency between replicator and suggester, rather than the other way around. {quote} Wait: how does this make sense?! It should be the other way around: if suggester has a sidecar it needs special logic for replication. It does not need faceting. Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649650#comment-13649650 ] Shai Erera commented on LUCENE-4975: As I said, arguments can be made both ways ... I don't know what's the best way here. I can see your point, but I don't feel good about having facet depend on replicator. I see Replicator as a higher-level service that besides providing the replication framework, also comes pre-built for replicating Lucene stuff. I don't mind seeing it grow to accommodate other Revision types in the future. For example, IndexAndTaxonomyRevision is just an example for replicating multiple indexes together. It can easily be duplicated to replicate few indexes at once, e.g. a MultiIndexRevision. Where would that object be? Cannot be in core, so why should IndexAndTaxo be in facet? Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649654#comment-13649654 ] Michael McCandless commented on LUCENE-4982: +1, good catch. Who tests the tester! Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4982.patch While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649655#comment-13649655 ] Adrien Grand commented on LUCENE-4975: -- Then maybe we could have sub-modules for specific replication strategies? lucene/replicator would only know how to handle raw indexes, while lucene/replicator/facets or lucene/replicator/suggest would implement custom logic? This way lucene/facet wouldn't need to pull all lucene/replicator transitive dependencies, and lucene/replicator wouldn't depend on any lucene module but lucene/core. Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649658#comment-13649658 ] Robert Muir commented on LUCENE-4975: - I still haven't had a change to look at the patch: but it sounds like some work needs to be done here to prevent dll hell. having replicator depend upon all sidecar modules is a no-go. it sounds like an interface is missing. Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4980) Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest
[ https://issues.apache.org/jira/browse/LUCENE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649661#comment-13649661 ] Michael McCandless commented on LUCENE-4980: bq. What if RangeAccumulator did that under the covers? Well ... I have a TODO to also support SortedSetDocValuesAccumulator. So I'm not quite sure what to name it / where to put it. Another option here is to commit this class only under src/test ... it's technically only needed right now by the test case to expose the bugs ... but then I'm using the class in the Jira search app, because I need to use DrillSideways with range and non-range facets, and without it things get very messy. So we need to fix something here, but we can do it in a separate issue after fixing these bugs. bq. Minor comments about the class: (a) can you rename 'a' and 'ra'? Will do ... bq. (b) why do you need to hold onto fspOrig? Is it because FA.searchParams isn't final? I need fspOrig in accumulator() to un-collate the wrapped ListFacetResult back in the same order as the original requests ... Can't use DrillSideways with both RangeFacetRequest and non-RangeFacetRequest - Key: LUCENE-4980 URL: https://issues.apache.org/jira/browse/LUCENE-4980 Project: Lucene - Core Issue Type: Bug Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.4 Attachments: LUCENE-4980.patch I tried to combine these two and there were several issues: * It's ... really tricky to manage the two different FacetAccumulators across that N FacetCollectors that DrillSideways creates ... to fix this I added a new MultiFacetsAccumulator that switches for you. * There was still one place in DS/DDQ that wasn't properly handling a non-Term drill-down. * There was a bug in the collector method for DrillSideways whereby if a given segment had no hits, it was skipped, which is incorrect because it must still be visited to tally up the sideways counts. * Separately I noticed that DrillSideways was doing too much work: it would count up drill-down counts *and* drill-sideways counts against the same dim (but then discard the drill-down counts in the end). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4785) New MaxScoreQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-4785: -- Attachment: SOLR-4785.patch First patch with tests and support for tie parameter New MaxScoreQParserPlugin - Key: SOLR-4785 URL: https://issues.apache.org/jira/browse/SOLR-4785 Project: Solr Issue Type: New Feature Components: query parsers Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Fix For: 5.0, 4.4 Attachments: SOLR-4785.patch A customer wants to contribute back this component. It is a QParser which behaves exactly like lucene parser (extends it), but returns the Max score from the clauses, i.e. max(c1,c2,c3..) instead of the default which is sum(c1,c2,c3...). It does this by wrapping all SHOULD clauses in a DisjunctionMaxQuery with tie=1.0. Any MUST or PROHIBITED clauses are passed through as-is. Non-boolean queries, e.g. NumericRange falls-through to lucene parser. To use, add to solrconfig.xml: {code:xml} queryParser name=maxscore class=solr.MaxScoreQParserPlugin/ {code} Then use it in a query {noformat} q=A AND B AND {!maxscore v=$max}max=C OR (D AND E) {noformat} This will return the score of A+B+max(C,sum(D+E)) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.
[ https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649670#comment-13649670 ] Tim Allison commented on LUCENE-949: Steve, no problem on the delay. Thank you for your help! Changes sound great. Thank you. AnalyzingQueryParser can't work with leading wildcards. --- Key: LUCENE-949 URL: https://issues.apache.org/jira/browse/LUCENE-949 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 2.2 Reporter: Stefan Klein Attachments: LUCENE-949.patch, LUCENE-949.patch, LUCENE-949.patch The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following changes to accept leading wildcards: protected Query getWildcardQuery(String field, String termStr) throws ParseException { String useTermStr = termStr; String leadingWildcard = null; if (*.equals(field)) { if (*.equals(useTermStr)) return new MatchAllDocsQuery(); } boolean hasLeadingWildcard = (useTermStr.startsWith(*) || useTermStr.startsWith(?)) ? true : false; if (!getAllowLeadingWildcard() hasLeadingWildcard) throw new ParseException('*' or '?' not allowed as first character in WildcardQuery); if (getLowercaseExpandedTerms()) { useTermStr = useTermStr.toLowerCase(); } if (hasLeadingWildcard) { leadingWildcard = useTermStr.substring(0, 1); useTermStr = useTermStr.substring(1); } List tlist = new ArrayList(); List wlist = new ArrayList(); /* * somewhat a hack: find/store wildcard chars in order to put them back * after analyzing */ boolean isWithinToken = (!useTermStr.startsWith(?) !useTermStr.startsWith(*)); isWithinToken = true; StringBuffer tmpBuffer = new StringBuffer(); char[] chars = useTermStr.toCharArray(); for (int i = 0; i useTermStr.length(); i++) { if (chars[i] == '?' || chars[i] == '*') { if (isWithinToken) { tlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = false; } else { if (!isWithinToken) { wlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = true; } tmpBuffer.append(chars[i]); } if (isWithinToken) { tlist.add(tmpBuffer.toString()); } else { wlist.add(tmpBuffer.toString()); } // get Analyzer from superclass and tokenize the term TokenStream source = getAnalyzer().tokenStream(field, new StringReader(useTermStr)); org.apache.lucene.analysis.Token t; int countTokens = 0; while (true) { try { t = source.next(); } catch (IOException e) { t = null; } if (t == null) { break; } if (!.equals(t.termText())) { try { tlist.set(countTokens++, t.termText()); } catch (IndexOutOfBoundsException ioobe) { countTokens = -1; } } } try { source.close(); } catch (IOException e) { // ignore
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649674#comment-13649674 ] Shai Erera commented on LUCENE-4975: Ok, so there are 3 options I see: (1) have Replicator depend on Facet (and in the future on other modules), (2) have Facet depend on Replicator and (3) move Revision and ReplicationHandler (interfaces) someplace else, core or a new module we call 'commons' and Replicator and Facet depend on it. Tests though will need to depend on replicator though, since they need ReplicationClient. BTW, the jetty dependencies are tests only, but I don't know how to make ivy resolve the dependencies just for tests. The only thing replicator depends on is servlet-api, for ReplicationService and httpclient for ReplicationClient. I think these need to remain in the module ... If we made Facet depend on Replicator (I'm not totally against it), would that require you to have lucene-replicator.jar on the classpath, even if you don't use replication? If not, then perhaps this dependency isn't so bad ... it's just a compile-time dependency. Tests will still need to depend on replicator for runtime, but that's ok I think. Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649692#comment-13649692 ] Joel Bernstein commented on SOLR-4787: -- Thanks David! Yeah, agreed the BSearch class is not ideal. I'll have a look at the SorterTemplate and get the integers sorted in place. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 Attachments: SOLR-4787.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.2.1 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649695#comment-13649695 ] Adrien Grand commented on SOLR-4787: Hi Joel. {{SorterTemplate}} has just been refactored into {{org.apache.lucene.util.Sorter}} (LUCENE-4946). You can have a look at Passage.sort() (https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/Passage.java) to see how to use it to sort parallel arrays. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 Attachments: SOLR-4787.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.2.1 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649702#comment-13649702 ] Robert Muir commented on LUCENE-4982: - Its not clear to me if with the patch we will double-count against disk full if copyBytes calls writeBytes behind the scenes... Maybe we can make the test have a max size of 2 bytes and copyBytes twice to it just so this is obvious? Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4982.patch While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649704#comment-13649704 ] Joel Bernstein commented on SOLR-4787: -- Hi Adrien, thanks for the information. I'll take a look at the Sorter today. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 Attachments: SOLR-4787.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.2.1 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649711#comment-13649711 ] Shai Erera commented on LUCENE-4982: I can modify the test sure. But the problem is that copyBytes doesn't call writeBytes, otherwise I would have tripped it. I.e., we call delegate.copyBytes, which internally may call *its* writeBytes, but not MockIO.writeBytes. Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4982.patch While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4982: --- Attachment: LUCENE-4982.patch I modified the test to set maxSize=2 and then write 2 bytes in two calls. The first should succeed, the second fail. However, even the first fails and now I don't know if it's a bug in the test or MockIO.checkDiskFull(). The latter (copy of the original code) does {{freeSpace = len}} -- is this ok? I mean, if I have room for 2 bytes and the caller asks to write 2 bytes, should we really fail on diskFull? Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4982.patch, LUCENE-4982.patch While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] Apache Lucene 4.3 released
May 2013, Apache Lucene™ 4.3 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.3 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Lucene 4.3 Release Highlights: * Significant performance improvements for minShouldMatch BooleanQuery due to skipping resulting in up to 4000% faster queries. * A new SortingAtomicReader which allows sorting an index based on a sort criteria (e.g. a numeric DocValues field), as well as SortingMergePolicy which sorts documents before segments are merged. * DocIdSetIterator and Scorer now has a cost API that provides an upper bound of the number of documents the iterator might match. This API allows optimisation during query execution or how filters are applied. * Analyzing/FuzzySuggester now allow to record arbitrary byte[] as a payload. The suggesters also use an ending offset to determine whether the last token was finished or not, so that a query i will no longer suggest Isla de Muerta for example. * Lucene Spatial Module can now search for indexed shapes by Within, Contains, and Disjoint relationships, in addition to typical Intersects. * PostingsHighlighter now allows custom passage scores, per-field BreakIterators and has been detached from TopDocs. Additionally, subclasses can override where string values for highlighting are pulled from alternatively to stored fields. * New SearcherTaxonomyManager manages near-real-time reopens of both IndexSearcher and TaxonomyReader (for faceting). * Added new facet method to the facet module to compute facet counts using SortedSetDocValuesField, without a separate taxonomy index. - DrillSideways class, for computing sideways facet counts, is now more flexible: it allows more than one FacetRequest per dimension and now allows drilling down on dimensions that do not have a facet request. - Various bugfixes and optimizations since the 4.2.1 release. Please read CHANGES.txt for a full list of new features. Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Lucene/Solr developers - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649727#comment-13649727 ] Adrien Grand commented on LUCENE-4975: -- bq. Then maybe we could have sub-modules for specific replication strategies? To make my point a little clearer, I was suggesting something pretty much like the analysis module: analyzers that require additional dependencies (such as icu or morfologik) are in their own sub-module so that you don't need to pull the ICU or Morfologik JARs if you just want to use LetterTokenizer (which is in lucene/analysis/common). Likewise, we could have the interface and the logic to replicate simple (no sidecar data) indexes in lucene/replicator/common and have sub-modules for facet (lucene/replicator/facet) or suggesters (lucene/replicator/suggesters). This may look overkill but at least this would help us keep dependencies clean between modules. Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] Apache Solr 4.3 released
May 2013, Apache Solr™ 4.3 available The Lucene PMC is pleased to announce the release of Apache Solr 4.3. Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.3 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Solr 4.3.0 Release Highlights: * Tired of maintaining core information in solr.xml? Now you can configure Solr to automatically find cores by walking an arbitrary directory. * Shard Splitting: You can now split SolrCloud shards to expand your cluster as you grow. * The read side schema REST API has been improved and expanded upon: all schema information is now available and the full live schema can now be returned in json or xml. Ground work is included for the upcoming write side of the schema REST API. * Spatial queries can now search for indexed shapes by IsWithin, Contains and IsDisjointTo relationships, in addition to typical Intersects. * Faceting now supports local parameters for faceting on the same field with different options. * Significant performance improvements for minShouldMatch (mm) queries due to skipping resulting in up to 4000% faster queries. * Various new highlighting configuration parameters. * A new solr.xml format that is closer to that of solrconfig.xml. The example still uses the old format, but 4.4 will ship with the new format. * Lucene 4.3.0 bug fixes and optimizations. Solr 4.3.0 also includes many other new features as well as numerous optimizations and bugfixes. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Lucene/Solr developers - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649731#comment-13649731 ] Shai Erera commented on LUCENE-4975: I think that's not a bad idea! replicator/common will include the interfaces (Revision and ReplicationHandler) + the framework impl and also IndexRevision/Handler. replicator/facet will include the taxonomy parts and depend on replicator/common and facet. I can also move the facet related code under oal.replicator.facet and then suppress the Lucene3x codec for just these tests. If others agree, I'll make the changes (mostly build.xml changes). Add Replication module to Lucene Key: LUCENE-4975 URL: https://issues.apache.org/jira/browse/LUCENE-4975 Project: Lucene - Core Issue Type: New Feature Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch, LUCENE-4975.patch I wrote a replication module which I think will be useful to Lucene users who want to replicate their indexes for e.g high-availability, taking hot backups etc. I will upload a patch soon where I'll describe in general how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4662) Finalize what we're going to do with solr.xml, auto-discovery, config sets.
[ https://issues.apache.org/jira/browse/SOLR-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649750#comment-13649750 ] Jan Høydahl commented on SOLR-4662: --- Where did the sharedLib stuff go in the new solr.xml? Will it work with {{str name=sharedLiblib/str}}? This should be documented in XML comments. Finalize what we're going to do with solr.xml, auto-discovery, config sets. --- Key: SOLR-4662 URL: https://issues.apache.org/jira/browse/SOLR-4662 Project: Solr Issue Type: Improvement Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Mark Miller Priority: Blocker Fix For: 4.3, 5.0 Attachments: SOLR-4662.patch, SOLR-4662.patch, SOLR-4662.patch, SOLR-4662.patch, SOLR-4662.patch, SOLR-4662.patch Spinoff from SOLR-4615, breaking it out here so we can address the changes in pieces. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [ANNOUNCE] Apache Lucene 4.3 released
Congratulations! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: Monday, May 06, 2013 3:08 PM To: dev@lucene.apache.org; java-user; gene...@lucene.apache.org; annou...@apache.org Subject: [ANNOUNCE] Apache Lucene 4.3 released May 2013, Apache Lucene™ 4.3 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.3 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Lucene 4.3 Release Highlights: * Significant performance improvements for minShouldMatch BooleanQuery due to skipping resulting in up to 4000% faster queries. * A new SortingAtomicReader which allows sorting an index based on a sort criteria (e.g. a numeric DocValues field), as well as SortingMergePolicy which sorts documents before segments are merged. * DocIdSetIterator and Scorer now has a cost API that provides an upper bound of the number of documents the iterator might match. This API allows optimisation during query execution or how filters are applied. * Analyzing/FuzzySuggester now allow to record arbitrary byte[] as a payload. The suggesters also use an ending offset to determine whether the last token was finished or not, so that a query i will no longer suggest Isla de Muerta for example. * Lucene Spatial Module can now search for indexed shapes by Within, Contains, and Disjoint relationships, in addition to typical Intersects. * PostingsHighlighter now allows custom passage scores, per-field BreakIterators and has been detached from TopDocs. Additionally, subclasses can override where string values for highlighting are pulled from alternatively to stored fields. * New SearcherTaxonomyManager manages near-real-time reopens of both IndexSearcher and TaxonomyReader (for faceting). * Added new facet method to the facet module to compute facet counts using SortedSetDocValuesField, without a separate taxonomy index. - DrillSideways class, for computing sideways facet counts, is now more flexible: it allows more than one FacetRequest per dimension and now allows drilling down on dimensions that do not have a facet request. - Various bugfixes and optimizations since the 4.2.1 release. Please read CHANGES.txt for a full list of new features. Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Lucene/Solr developers - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: VOTE: solr no longer webapp
* Shouldn't we be able to plug and play the underlying http layer technology? * shouldn't we be able to try and use embedded jetty and its nice integration with guice+restlet? Check out using netty? +1
[jira] [Updated] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4982: --- Attachment: LUCENE-4982.patch I changed the check to {{freeSpace len}}, but then the test failed to trip disk-full the second time, unless I call out.flush() in between. Debugging tells me that RAMOutputStream sets RAMFile.length only on flush(), therefore even if I attempt to write a 2K byte[] (with maxSize=2), the test doesn't fail. Seems like getRecomputedActualSizeInBytes is not very useful, since if the Dir is not RAMDir, it just calls sizeInBytes() which computes the size from the file-system, and if it is, then RAMFile.length isn't up-to-date, leading to incorrect (0) size computed, unless some files were flushed already. But getRecomputed cannot flush the streams either in that case ... So I think I'll leave the test like that. In a real test which wants to trip on disk-full, it will usually involve indexing, hence files will be flushed and recomputed will return some number, not really the actual number of bytes used, but some number. Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4982.patch, LUCENE-4982.patch, LUCENE-4982.patch While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649784#comment-13649784 ] Shai Erera edited comment on LUCENE-4982 at 5/6/13 3:13 PM: I changed the check to {{freeSpace len}}, but then the test failed to trip disk-full the second time, unless I call out.flush() in between. Debugging tells me that RAMOutputStream sets RAMFile.length only on flush(), therefore even if I attempt to write a 2K byte[] (with maxSize=2), the test doesn't fail. Seems like getRecomputedActualSizeInBytes is not very accurate. It only returns the size of the flushed files (even for FSDir). This may be ok, dunno. It just felt wrong for RAMDirectory, since there is no real buffering happening. Anyway, I guess we'll have to live with that. Disk-full is anyway a best effort, so in this test, I'll just call flush(). In real tests that want to trip disk-full, usually indexing happens and therefore files get flushed, and the size measure is closer. was (Author: shaie): I changed the check to {{freeSpace len}}, but then the test failed to trip disk-full the second time, unless I call out.flush() in between. Debugging tells me that RAMOutputStream sets RAMFile.length only on flush(), therefore even if I attempt to write a 2K byte[] (with maxSize=2), the test doesn't fail. Seems like getRecomputedActualSizeInBytes is not very useful, since if the Dir is not RAMDir, it just calls sizeInBytes() which computes the size from the file-system, and if it is, then RAMFile.length isn't up-to-date, leading to incorrect (0) size computed, unless some files were flushed already. But getRecomputed cannot flush the streams either in that case ... So I think I'll leave the test like that. In a real test which wants to trip on disk-full, it will usually involve indexing, hence files will be flushed and recomputed will return some number, not really the actual number of bytes used, but some number. Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4982.patch, LUCENE-4982.patch, LUCENE-4982.patch While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4981: - Attachment: LUCENE-4981.patch Here is the patch for 4.x. The patch for trunk is simpler as PositionFilter and PositionFilterFactory would simply be removed. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649833#comment-13649833 ] Christian Moen commented on LUCENE-4956: bq. I think we're ready for the incubator-general vote. [~cm], do you agree? +1 the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode
[ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649856#comment-13649856 ] Commit Tag Bot commented on SOLR-3240: -- [trunk commit] jdyer http://svn.apache.org/viewvc?view=revisionrevision=1479638 SOLR-3240: add spellcheck.collateMaxCollectDocs for estimating collation hit-counts. add spellcheck 'approximate collation count' mode - Key: SOLR-3240 URL: https://issues.apache.org/jira/browse/SOLR-3240 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Robert Muir Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions will actually net results (taking into account context like filtering). In order to do this (from my understanding), it generates candidate queries, executes them, and saves the total hit count: collation.setHits(hits). For a large index it seems this might be doing too much work: in particular I'm interested in ensuring this feature can work fast enough/well for autosuggesters. So I think we should offer an 'approximate' mode that uses an early-terminating Collector, collect()ing only N docs (e.g. n=1), and we approximate this result count based on docid space. I'm not sure what needs to happen on the solr side (possibly support for custom collectors?), but I think this could help and should possibly be the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3240) add spellcheck 'approximate collation count' mode
[ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer reassigned SOLR-3240: Assignee: James Dyer add spellcheck 'approximate collation count' mode - Key: SOLR-3240 URL: https://issues.apache.org/jira/browse/SOLR-3240 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Robert Muir Assignee: James Dyer Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions will actually net results (taking into account context like filtering). In order to do this (from my understanding), it generates candidate queries, executes them, and saves the total hit count: collation.setHits(hits). For a large index it seems this might be doing too much work: in particular I'm interested in ensuring this feature can work fast enough/well for autosuggesters. So I think we should offer an 'approximate' mode that uses an early-terminating Collector, collect()ing only N docs (e.g. n=1), and we approximate this result count based on docid space. I'm not sure what needs to happen on the solr side (possibly support for custom collectors?), but I think this could help and should possibly be the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649857#comment-13649857 ] Jack Krupansky commented on LUCENE-4956: I am not really familiar with the incubator-general vote. From looking at the legal clearance page, it sounds like the vote is simply accepting the donation, as opposed to voting that the branch is ready to commit to trunk, correct? I did a Jira search and found no previous references to incubator-general vote - from Google search I got the impression it was more related to podlings rather than simple code module contributions. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
A couple of high level Solr issues
A solr-user list discussion led to some general thoughts about Solr that I think need some further discussion. I'm ready to open an issue, I just thought it might be better to define a direction first. Some of the option/attribute names in config files aren't self-descriptive, or have become not quite correct due to Solr's evolution. One example is instanceDir but there are probably others. I'm not sure that instanceDir was ever a good name. It probably made complete sense when multicore first came into being, as it was meant to replace multiple instances of Solr. Coming up with a better name is a little tricky. A simple and currently relevant replacement would be coreDir ... but if you're using SolrCloud, Jack Krupansky has put forth some names that might be better: replicaDir, shardReplicaDir, or even the wordy but extremely accurate collectionShardReplicaDir. In recent years, Solr has evolved from multicore-capable to multicore in even the simple example. If we have a similar migration so that all Solr installations are SolrCloud installations, then having replicas instead of cores (replacing instanceDir with replicaDir) might be the right way to go. Will we ever have a larger abstraction than collections? If we think that could ever happen, we should probably think of a name for it. I think that we need to start a general overhaul of various identifiers in config files and APIs, planning ahead to accommodate future (in)sanity. Because it could be extremely disruptive to ongoing development, that probably needs to happen in a branch. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Including JTS in an Apache project
Thanks Dave for the detailed response. As I understand, spatial 4j is a separate project that acts as a plugin to solr and Lucene. However, it is still licensed under Apache license. Does including it as is inside Lucene or solr break the Apache license or you have it as a separate project for another reason? I took a loot at spatial 4j and it looks very nice. I like the idea that you define your own interface and use JTS as another implementation for that interface. However, I don't think I will be able to use it in Pig for one reason. As per their website, JTS conforms with the OGC standard for SQL [http://www.opengeospatial.org/standards]. It is important to follow the OGC for the addition I'm proposing to Pig as it makes it more acceptable in the GIS community which I'm targeting. I talked with people from different industrial and research organizations and they all said that they can only use it if it conforms with OGC standards just as JTS and PostGIS [http://postgis.net/] do. What I can do for now is to make this extension as a separate open source project under the Apache license. However, as far as I understand, I cannot merge this extension with Apache Pig unless we resolve the license issue. Thanks Ahmed Best regards, Ahmed Eldawy On Sun, May 5, 2013 at 11:38 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Hi Ahmed, I faced your conundrum with JTS early last year. As you know, the Apache Software Foundation doesn't like it's projects depending on GPL and even LGPL licensed libraries. The ASF does not have clear unambiguous language on how its projects can depend on them in a limited sense. Different PMCs (projects) have different standards. I've heard of one project (CXF?) that uses Java reflection to use an LGPL library. I think another downloads the LGPL library as part of the build, and then the code has a compile-time dependency (I could be mistaken). If memory serves, in both cases the dependency fit an optional role and not a core purpose of the software. The Lucene PMC in particular didn't formally vote to my knowledge but there was a time when it was clear to me that such approaches were not acceptable. The approach that the Lucene spatial developers took (me, Ryan, Chris) was to create a non-ASF project called Spatial4j that is ASL licensed. Spatial4j *optionally* depends on JTS -- it's only for advanced shapes (namely polygons) and for WKT parsing. https://github.com/spatial4j/spatial4j BTW, WKT parsing will be handled by Spatial4j itself in the near future without JTS. Spatial4j is not a subset of JTS; it critically has things JTS doesn't like a native circle (not a polygon approximation) and the concept of the world being a sphere instead of flat ;-) That's right, JTS, as critical as it is in the world of open-source spatial, doesn't have any geodetic calculations, just Euclidean. Spatial4j adds dateline wrap support to JTS shapes so you can represent Fiji for example, but not yet Antarctica (no pole wrap). So I encourage the Apache Pig project to take a look at using Spatial4j instead of directly using JTS for the same reasons that the Lucene project uses it. If you ultimately decide not to then please let me know why, as I see Spatial4j being an excellent fit for ASF projects in particular because of the licensing issue. So your statement Apache Solr *uses* JTS is incorrect. No it doesn't, and nor does Lucene; not at all. Instead, those projects use Spatial4j, which has an abstraction (Shape), and it has an implementation of that abstraction that depends on JTS. It also has implementations that don't depend on JTS. p.s. Last week I did a long presentation on Spatial in Lucene/Solr/Spatial4j and I'd be happy to share the slides with you. The organizers will but they haven't yet. ~ David Smiley Ahmed El-dawy wrote Hi all, I saw that Apache solr uses JTS (Java Topology Suite) [ http://www.vividsolutions.com/jts/JTSHome.htm] for supporting a spatial data type [http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4]. Using JTS in an Apache project is not a straight forward thing as JTS is licensed under LGPL which has some compatibility issued when included in an Apache project. Now, I need to dome something very similar in another Apache project (Pig [http://pig.apache.org/]) and I'm faced with the licensing issue. I'm asking for your advice for the best way we can do to use JTS without breaking the license issue. Does referring to JTS classes from the code of an Apache project without actually including the classes violate the license? Do we have to load the classes dynamically (using Class#forName) or there is another way to do it? Thanks in advance Best regards, Ahmed Eldawy - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context:
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649868#comment-13649868 ] Robert Muir commented on LUCENE-4956: - Jack, thats correct. It is a vote for IP clearance. For example, Simon called an IP clearance vote on the incubator list for Kuromoji before we integrated it into Lucene. the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode
[ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649873#comment-13649873 ] Commit Tag Bot commented on SOLR-3240: -- [branch_4x commit] jdyer http://svn.apache.org/viewvc?view=revisionrevision=1479644 SOLR-3240: add spellcheck.collateMaxCollectDocs for estimating collation hit-counts. add spellcheck 'approximate collation count' mode - Key: SOLR-3240 URL: https://issues.apache.org/jira/browse/SOLR-3240 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Robert Muir Assignee: James Dyer Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions will actually net results (taking into account context like filtering). In order to do this (from my understanding), it generates candidate queries, executes them, and saves the total hit count: collation.setHits(hits). For a large index it seems this might be doing too much work: in particular I'm interested in ensuring this feature can work fast enough/well for autosuggesters. So I think we should offer an 'approximate' mode that uses an early-terminating Collector, collect()ing only N docs (e.g. n=1), and we approximate this result count based on docid space. I'm not sure what needs to happen on the solr side (possibly support for custom collectors?), but I think this could help and should possibly be the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode
[ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649874#comment-13649874 ] Robert Muir commented on SOLR-3240: --- Thanks for taking care James: nice work add spellcheck 'approximate collation count' mode - Key: SOLR-3240 URL: https://issues.apache.org/jira/browse/SOLR-3240 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Robert Muir Assignee: James Dyer Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions will actually net results (taking into account context like filtering). In order to do this (from my understanding), it generates candidate queries, executes them, and saves the total hit count: collation.setHits(hits). For a large index it seems this might be doing too much work: in particular I'm interested in ensuring this feature can work fast enough/well for autosuggesters. So I think we should offer an 'approximate' mode that uses an early-terminating Collector, collect()ing only N docs (e.g. n=1), and we approximate this result count based on docid space. I'm not sure what needs to happen on the solr side (possibly support for custom collectors?), but I think this could help and should possibly be the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: A couple of high level Solr issues
And multicore is one of my examples of what should go away as a legacy term. It should be simply multiple collections, independent of whether it is single node and single-shard, regardless of whether cloud/distrib is involved. Actually, multicore appears to be two distinct use cases: 1. Multiple collections. 2. Replication. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Monday, May 06, 2013 12:58 PM To: dev@lucene.apache.org Subject: A couple of high level Solr issues A solr-user list discussion led to some general thoughts about Solr that I think need some further discussion. I'm ready to open an issue, I just thought it might be better to define a direction first. Some of the option/attribute names in config files aren't self-descriptive, or have become not quite correct due to Solr's evolution. One example is instanceDir but there are probably others. I'm not sure that instanceDir was ever a good name. It probably made complete sense when multicore first came into being, as it was meant to replace multiple instances of Solr. Coming up with a better name is a little tricky. A simple and currently relevant replacement would be coreDir ... but if you're using SolrCloud, Jack Krupansky has put forth some names that might be better: replicaDir, shardReplicaDir, or even the wordy but extremely accurate collectionShardReplicaDir. In recent years, Solr has evolved from multicore-capable to multicore in even the simple example. If we have a similar migration so that all Solr installations are SolrCloud installations, then having replicas instead of cores (replacing instanceDir with replicaDir) might be the right way to go. Will we ever have a larger abstraction than collections? If we think that could ever happen, we should probably think of a name for it. I think that we need to start a general overhaul of various identifiers in config files and APIs, planning ahead to accommodate future (in)sanity. Because it could be extremely disruptive to ongoing development, that probably needs to happen in a branch. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries
[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649879#comment-13649879 ] Steve Rowe commented on LUCENE-4956: Hi Jack, From [http://incubator.apache.org/ip-clearance/], which is (quoting from that page): {quote} Intellectual property clearance One of the Incubator's roles is to ensure that proper attention is paid to intellectual property. From time to time, an external codebase is brought into the ASF that is not a separate incubating project but still represents a substantial contribution that was not developed within the ASF's source control system and on our public mailing lists. This is a short form of the Incubation checklist, designed to allow code to be imported with alacrity while still providing for oversight. [...] Once a PMC directly checks-in a filled-out short form, the Incubator PMC will need to approve the paper work after which point the receiving PMC is free to import the code. {quote} The short form referred to above is an XML template, which I've completed for this code base, and which is at some (apparently regular?) interval converted to HTML (this is also linked from the above-linked IP clearance page as Korean Analyzer): [http://incubator.apache.org/ip-clearance/lucene-korean-analyzer.html] the korean analyzer that has a korean morphological analyzer and dictionaries - Key: LUCENE-4956 URL: https://issues.apache.org/jira/browse/LUCENE-4956 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Affects Versions: 4.2 Reporter: SooMyung Lee Assignee: Christian Moen Labels: newbie Attachments: kr.analyzer.4x.tar Korean language has specific characteristic. When developing search service with lucene solr in korean, there are some problems in searching and indexing. The korean analyer solved the problems with a korean morphological anlyzer. It consists of a korean morphological analyzer, dictionaries, a korean tokenizer and a korean filter. The korean anlyzer is made for lucene and solr. If you develop a search service with lucene in korean, It is the best idea to choose the korean analyzer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode
[ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649882#comment-13649882 ] Commit Tag Bot commented on SOLR-3240: -- [trunk commit] jdyer http://svn.apache.org/viewvc?view=revisionrevision=1479645 SOLR-3240: add spellcheck.collateMaxCollectDocs (removing dead code). add spellcheck 'approximate collation count' mode - Key: SOLR-3240 URL: https://issues.apache.org/jira/browse/SOLR-3240 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Robert Muir Assignee: James Dyer Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions will actually net results (taking into account context like filtering). In order to do this (from my understanding), it generates candidate queries, executes them, and saves the total hit count: collation.setHits(hits). For a large index it seems this might be doing too much work: in particular I'm interested in ensuring this feature can work fast enough/well for autosuggesters. So I think we should offer an 'approximate' mode that uses an early-terminating Collector, collect()ing only N docs (e.g. n=1), and we approximate this result count based on docid space. I'm not sure what needs to happen on the solr side (possibly support for custom collectors?), but I think this could help and should possibly be the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode
[ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649884#comment-13649884 ] Commit Tag Bot commented on SOLR-3240: -- [branch_4x commit] jdyer http://svn.apache.org/viewvc?view=revisionrevision=1479647 SOLR-3240: add spellcheck.collateMaxCollectDocs (removing dead code). add spellcheck 'approximate collation count' mode - Key: SOLR-3240 URL: https://issues.apache.org/jira/browse/SOLR-3240 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Robert Muir Assignee: James Dyer Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions will actually net results (taking into account context like filtering). In order to do this (from my understanding), it generates candidate queries, executes them, and saves the total hit count: collation.setHits(hits). For a large index it seems this might be doing too much work: in particular I'm interested in ensuring this feature can work fast enough/well for autosuggesters. So I think we should offer an 'approximate' mode that uses an early-terminating Collector, collect()ing only N docs (e.g. n=1), and we approximate this result count based on docid space. I'm not sure what needs to happen on the solr side (possibly support for custom collectors?), but I think this could help and should possibly be the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: A couple of high level Solr issues
On 5/6/2013 10:58 AM, Shawn Heisey wrote: A solr-user list discussion led to some general thoughts about Solr that I think need some further discussion. I'm ready to open an issue, I just thought it might be better to define a direction first. This started out as an email about two issues, but in the end I decided to only put one of them in this email, so the subject is wrong! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [ANNOUNCE] Apache Lucene 4.3 released
Great! well done... On Mon, May 6, 2013 at 10:03 AM, Uwe Schindler u...@thetaphi.de wrote: Congratulations! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@gmail.com] Sent: Monday, May 06, 2013 3:08 PM To: dev@lucene.apache.org; java-user; gene...@lucene.apache.org; annou...@apache.org Subject: [ANNOUNCE] Apache Lucene 4.3 released May 2013, Apache Lucene™ 4.3 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.3 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Lucene 4.3 Release Highlights: * Significant performance improvements for minShouldMatch BooleanQuery due to skipping resulting in up to 4000% faster queries. * A new SortingAtomicReader which allows sorting an index based on a sort criteria (e.g. a numeric DocValues field), as well as SortingMergePolicy which sorts documents before segments are merged. * DocIdSetIterator and Scorer now has a cost API that provides an upper bound of the number of documents the iterator might match. This API allows optimisation during query execution or how filters are applied. * Analyzing/FuzzySuggester now allow to record arbitrary byte[] as a payload. The suggesters also use an ending offset to determine whether the last token was finished or not, so that a query i will no longer suggest Isla de Muerta for example. * Lucene Spatial Module can now search for indexed shapes by Within, Contains, and Disjoint relationships, in addition to typical Intersects. * PostingsHighlighter now allows custom passage scores, per-field BreakIterators and has been detached from TopDocs. Additionally, subclasses can override where string values for highlighting are pulled from alternatively to stored fields. * New SearcherTaxonomyManager manages near-real-time reopens of both IndexSearcher and TaxonomyReader (for faceting). * Added new facet method to the facet module to compute facet counts using SortedSetDocValuesField, without a separate taxonomy index. - DrillSideways class, for computing sideways facet counts, is now more flexible: it allows more than one FacetRequest per dimension and now allows drilling down on dimensions that do not have a facet request. - Various bugfixes and optimizations since the 4.2.1 release. Please read CHANGES.txt for a full list of new features. Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Lucene/Solr developers - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13649960#comment-13649960 ] Andy Fowler commented on SOLR-4773: --- I'm thinking this is the cause of a bug I'm seeing in the 4.3.0 release. To reproduce using the multicore example: * echo solr/solr multicore/solr.xml to put it into core discovery mode * place a core.properties file in core0/ and core1/ directories, just with loadOnStartup and transient properties defined. * start example `java -Dsolr.solr.home=multicore -jar start.jar` You should receive a More than one core points to data dir 'multicore/data/' failure on startup. Setting a relative path in each core.properties file doesn't work — it only works when I provide discrete dataDir for each core. New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3240) add spellcheck 'approximate collation count' mode
[ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-3240. -- Resolution: Fixed Fix Version/s: 4.4 5.0 add spellcheck 'approximate collation count' mode - Key: SOLR-3240 URL: https://issues.apache.org/jira/browse/SOLR-3240 Project: Solr Issue Type: Improvement Components: spellchecker Reporter: Robert Muir Assignee: James Dyer Fix For: 5.0, 4.4 Attachments: SOLR-3240.patch, SOLR-3240.patch, SOLR-3240.patch SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions will actually net results (taking into account context like filtering). In order to do this (from my understanding), it generates candidate queries, executes them, and saves the total hit count: collation.setHits(hits). For a large index it seems this might be doing too much work: in particular I'm interested in ensuring this feature can work fast enough/well for autosuggesters. So I think we should offer an 'approximate' mode that uses an early-terminating Collector, collect()ing only N docs (e.g. n=1), and we approximate this result count based on docid space. I'm not sure what needs to happen on the solr side (possibly support for custom collectors?), but I think this could help and should possibly be the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3917) Port pruning module to trunk apis
[ https://issues.apache.org/jira/browse/LUCENE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Bowyer updated LUCENE-3917: Attachment: LUCENE-3917-Initial-port-of-index-pruning.patch Recently at $DAYJOB the horror that is high frequency terms in OR search came to bite us, as a result I have an interest in pruning again. As such I made an attempt to forward port the existing pruning package directly to Lucene 4.0. This is largely a mechanical port, I have not put any real thought into it so its probably terrible. This does not pass its unit test, and is a mess internally in the code, I am going to try to get the unit test working and then loop back on making the code more lucene 4.x friendly. One question that occurs from this is how AtomicReaders are handled, do we want to pruning per segment with global stats, prune based on segment stats or just do the terrible thing and work with a SlowCompositeReader. I also think, given the work that went on with LUCENE-4752 it might be possible to do the pruning in a similar fashion to the sorting merge such that we do a pruning merge. Port pruning module to trunk apis - Key: LUCENE-3917 URL: https://issues.apache.org/jira/browse/LUCENE-3917 Project: Lucene - Core Issue Type: Task Components: modules/other Affects Versions: 4.0-ALPHA Reporter: Robert Muir Fix For: 4.3 Attachments: LUCENE-3917-Initial-port-of-index-pruning.patch Pruning module was added in LUCENE-1812, but we need to port this to trunk (4.0) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 37891 - Failure!
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/37891/ No tests ran. Build Log: [...truncated 119 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4982) Make MockIndexOutputWrapper check disk full on copyBytes
[ https://issues.apache.org/jira/browse/LUCENE-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650009#comment-13650009 ] Shai Erera commented on LUCENE-4982: I thought about this some more and I realize that getComputedActualSizeInBytes works as expected. checkDiskFull should only trip if the Directory size has reached the limit, and it cannot tell how many bytes are pending in a buffer. The test would fail not only w/ RAMDirectory, but also a Directory which buffers writes (which I believe all our directories do), and therefore flush() is important for the test. So to summarize the changes in this issue: * Added checkDiskFull to MockIOWrapper so it can trip writeBytes and copyBytes. * Changed checkDiskFull to do {{freeSpace len}} because {{freeSpace == len}} is still valid. * Added a test I plan to commit this tomorrow. Make MockIndexOutputWrapper check disk full on copyBytes Key: LUCENE-4982 URL: https://issues.apache.org/jira/browse/LUCENE-4982 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Shai Erera Assignee: Shai Erera Attachments: LUCENE-4982.patch, LUCENE-4982.patch, LUCENE-4982.patch While working on the consistency test for Replicator (LUCENE-4975), I noticed that I don't trip disk-full exceptions and tracked it down to MockIndexOutputWrapper.copyBytes not doing these checks like writeBytes. I'd like to add this check. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4789) CoreAdminHandler should write core.properties files in discovery mode
Andy Fowler created SOLR-4789: - Summary: CoreAdminHandler should write core.properties files in discovery mode Key: SOLR-4789 URL: https://issues.apache.org/jira/browse/SOLR-4789 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.3 Reporter: Andy Fowler When using the new core discovery method, cores created via CoreAdminHandler are never persisted, since they should be writing files to $INSTANCEDIR/core.properties. CoreAdminHandler should probably write core.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4789) CoreAdminHandler should write core.properties files in discovery mode
[ https://issues.apache.org/jira/browse/SOLR-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey reassigned SOLR-4789: -- Assignee: Erick Erickson Erick requested assignment via #solr irc channel. CoreAdminHandler should write core.properties files in discovery mode - Key: SOLR-4789 URL: https://issues.apache.org/jira/browse/SOLR-4789 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.3 Reporter: Andy Fowler Assignee: Erick Erickson When using the new core discovery method, cores created via CoreAdminHandler are never persisted, since they should be writing files to $INSTANCEDIR/core.properties. CoreAdminHandler should probably write core.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4789) CoreAdminHandler should write core.properties files in discovery mode
[ https://issues.apache.org/jira/browse/SOLR-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650023#comment-13650023 ] Shawn Heisey edited comment on SOLR-4789 at 5/6/13 7:34 PM: Erick requested assignment via #lucene-dev irc channel. was (Author: elyograg): Erick requested assignment via #solr irc channel. CoreAdminHandler should write core.properties files in discovery mode - Key: SOLR-4789 URL: https://issues.apache.org/jira/browse/SOLR-4789 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.3 Reporter: Andy Fowler Assignee: Erick Erickson When using the new core discovery method, cores created via CoreAdminHandler are never persisted, since they should be writing files to $INSTANCEDIR/core.properties. CoreAdminHandler should probably write core.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4789) CoreAdminHandler should write core.properties files in discovery mode
[ https://issues.apache.org/jira/browse/SOLR-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650029#comment-13650029 ] Mark Miller commented on SOLR-4789: --- Interesting - I thought there was code that created this file when trying to read it the first time and not finding. Still a lot of tests to add for this new code path I think - I've made it the default now so that devs can start running into these problems faster. CoreAdminHandler should write core.properties files in discovery mode - Key: SOLR-4789 URL: https://issues.apache.org/jira/browse/SOLR-4789 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.3 Reporter: Andy Fowler Assignee: Erick Erickson When using the new core discovery method, cores created via CoreAdminHandler are never persisted, since they should be writing files to $INSTANCEDIR/core.properties. CoreAdminHandler should probably write core.properties files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650032#comment-13650032 ] Mark Miller commented on SOLR-4773: --- bq. You should receive a More than one core points to data dir 'multicore/data/' failure on startup. Setting a relative path in each core.properties file doesn't work — it only works when I provide discrete dataDir for each core. I've ripped all that data dir checking out for 4.4. New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650033#comment-13650033 ] Andy Fowler commented on SOLR-4773: --- Confirmed by compiling branch_4x that this fixes the bug I noticed in 4.3.0 release. To future travelers, this means that each core.properties file needs a discrete dataDir property in 4.3.0. New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4583) Change the examples to use solr.properties and auto-discover cores rather than solr.xml
[ https://issues.apache.org/jira/browse/SOLR-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4583. -- Resolution: Fixed Fix Version/s: 5.0 4.3 Actually, rather than solr.properties it's new-style solr.xml. But Mark Miller fixed it. Change the examples to use solr.properties and auto-discover cores rather than solr.xml --- Key: SOLR-4583 URL: https://issues.apache.org/jira/browse/SOLR-4583 Project: Solr Issue Type: Improvement Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 4.3, 5.0 If we're going to move forward with obsoleting solr.xml and auto-discovering cores, we need to have as many people using this as possible. I'd like to change the examples to NOT use solr.xml so that this bus leaves the station. solr.xml will still work as it does today, but before we make the cut-over we need enough mileage on it to be confident. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4583) Change the examples to use new-style solr.xml and auto-discover cores rather than old-style solr.xml that defined cores
[ https://issues.apache.org/jira/browse/SOLR-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4583: - Summary: Change the examples to use new-style solr.xml and auto-discover cores rather than old-style solr.xml that defined cores (was: Change the examples to use solr.properties and auto-discover cores rather than solr.xml) Change the examples to use new-style solr.xml and auto-discover cores rather than old-style solr.xml that defined cores --- Key: SOLR-4583 URL: https://issues.apache.org/jira/browse/SOLR-4583 Project: Solr Issue Type: Improvement Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 4.3, 5.0 If we're going to move forward with obsoleting solr.xml and auto-discovering cores, we need to have as many people using this as possible. I'd like to change the examples to NOT use solr.xml so that this bus leaves the station. solr.xml will still work as it does today, but before we make the cut-over we need enough mileage on it to be confident. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650042#comment-13650042 ] Erick Erickson commented on SOLR-4773: -- bq: I've ripped all that data dir checking out for 4.4. Does that include the checking for cores with the same name? Seems like that makes it easier for people to shoot themselves in the foot without giving them _any_ clues what went wrong. And core discovery makes that pretty easy to do, just copy the core.properties file around and forget to change the name parameter.. Or an absolute path to the datadir New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650058#comment-13650058 ] Mark Miller commented on SOLR-4773: --- bq. Does that include the checking for cores with the same name? Yes, all this checking and how it was done just further complicates the code and we want to get away from pre configuration as a way to create collections anyhow. We should just keep this simple - a core should fail to be created in the core container if there is an existing core with the same name, that's it. I feel all the transient and other recent changes to CoreContainer are really starting to significantly complicate what was already a design that needed some love, so I'm trying to simplify as much as possible so we can more easily refactor down the line. New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-4787: - Attachment: SOLR-4787.patch Changed the BSearch class to use the SorterTemplate rather then Collections.sort. Much more efficient inplace sorting. SorterTemplate builds with Solr 4.2.1. Will need to get this working with trunk as well using the new Sorter class. Found major bug in my original logic for how segment level readers were being used between the join cores and fixed that as well. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 Attachments: SOLR-4787.patch, SOLR-4787.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.2.1 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650063#comment-13650063 ] Joel Bernstein edited comment on SOLR-4787 at 5/6/13 8:16 PM: -- Changed the BSearch class to use the SorterTemplate rather then Collections.sort. Much more efficient inplace sorting. SorterTemplate builds with Solr 4.2.1. Will need to get this working with trunk as well using the new Sorter class. Thanks David and Adrien for tips on this. Found major bug in my original logic for how segment level readers were being used between the join cores and fixed that as well. was (Author: joel.bernstein): Changed the BSearch class to use the SorterTemplate rather then Collections.sort. Much more efficient inplace sorting. SorterTemplate builds with Solr 4.2.1. Will need to get this working with trunk as well using the new Sorter class. Found major bug in my original logic for how segment level readers were being used between the join cores and fixed that as well. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.2.1 Attachments: SOLR-4787.patch, SOLR-4787.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 2 join implementations. The initial patch was generated from the Solr 4.2.1 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *PostFilterJoinQParserPlugin aka pjoin* The pjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the pjoin is designed to work with integer join keys only. So, in order to use pjoin, integer join keys must be included in both the to and from core. The second difference is that the pjoin builds memory structures that are used to quickly connect the join keys. It also uses a custom SolrCache named join to hold intermediate DocSets which are needed to build the join memory structures. So, the pjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the pjoin is that it can scale to join millions of keys between cores. Because it's a PostFilter, it only needs to join records that match the main query. The syntax of the pjoin is the same as the JoinQParserPlugin except that the plugin is referenced by the string pjoin rather then join. fq=\{!pjoin fromCore=collection2 from=id_i to=id_i\}user:customer1 The example filter query above will search the fromCore (collection2) for user:customer1. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the pjoin. queryParser name=pjoin class=org.apache.solr.joins.PostFilterJoinQParserPlugin/ And the join contrib jars must be registed in the solrconfig.xml. lib dir=../../../dist/ regex=solr-joins-\d.*\.jar / The solrconfig.xml in the fromcore must have the join SolrCache configured. cache name=join class=solr.LRUCache size=4096 initialSize=1024 / *JoinValueSourceParserPlugin aka vjoin* The second implementation is the JoinValueSourceParserPlugin aka vjoin. This implements a ValueSource function query that can return values from a second core based on join keys. This allows relevance data to be stored in a separate core and then joined in the main query. The vjoin is called using the vjoin function query. For example: bf=vjoin(fromCore, fromKey, fromVal, toKey) This example shows vjoin being called by the edismax boost function parameter. This example will return the fromVal from the fromCore. The fromKey and toKey are used to link the records from the main query to the records in the fromCore. As with the pjoin, both the fromKey and toKey must be integers. Also like the pjoin, the join SolrCache is used to hold the join memory structures. To configure the vjoin you must register the ValueSource plugin in the solrconfig.xml as follows: valueSourceParser name=vjoin class=org.apache.solr.joins.JoinValueSourceParserPlugin / -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact
Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 37891 - Failure!
jvm crash On Mon, May 6, 2013 at 2:52 PM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/37891/ No tests ran. Build Log: [...truncated 119 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650098#comment-13650098 ] Erick Erickson commented on SOLR-4773: -- Makes sense, I wasn't altogether happy with the complexification. But we're leaving the user high and dry when tracking down errors. Take 4.x and just copy collection1 to collection2 and fire up solr. No warnings in the log. No errors in the log. But you can't get to collection2, you get a 404 error. And any index mods are done in the collection2 directory. Admittedly the configuration is foo'd and Solr is doing exactly what the defined behavior is (identically named cores last one wins). But how the hell is someone supposed to track that down? Especially with lots of cores? They don't get a single clue in the place we always say to look, the solr log. I see where there are tests for creating a core with the same name as an existing core via the core admin handler, but I don't see at a glance any coverage for this scenario. New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650104#comment-13650104 ] Andy Fowler commented on SOLR-4773: --- Just to throw in my $0.02 as an app developer and solr consumer w/ far less knowledge on the rest of the worlds' use cases: if I accidentally put solr into a state where two cores were sharing a dataDir, I would really want some sort of strong warning, or just an absolute failure. I really like the way that cores are moving to being just a simple directory on the FS, rather than a block in a monolithic XML file. But if the cores are moving toward more backing by directory + properties file, it seems like accidentally sharing a dataDir could be a really bad thing. New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650117#comment-13650117 ] Mark Miller commented on SOLR-4773: --- You should get an error as I said - we just shouldn't be trying to detect it that way. Corecontainer should throw an exception when a core is added with an existing name. New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4790) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error
Erick Erickson created SOLR-4790: Summary: When defining a core with the same name (discovery mode or not), CoreContainer should throw an error Key: SOLR-4790 URL: https://issues.apache.org/jira/browse/SOLR-4790 Project: Solr Issue Type: Bug Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson When you define a core with the same name as another core (discovery mode definitely, old-style xml probably), last one wins. Which means it's very hard to track down what caused the problem. What's worse, the last-encountered core replaces the first one, leading to cores that change an unexpected index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4773) New discovery mode needs to ensure that instanceDir is correct
[ https://issues.apache.org/jira/browse/SOLR-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650123#comment-13650123 ] Erick Erickson commented on SOLR-4773: -- New JIRA for same-named cores, see SOLR-4790. New discovery mode needs to ensure that instanceDir is correct -- Key: SOLR-4773 URL: https://issues.apache.org/jira/browse/SOLR-4773 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Mark Miller Fix For: 5.0, 4.4 Attachments: SOLR-4773.patch, SOLR-4773.patch Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core). I'll attach a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650137#comment-13650137 ] Steve Rowe commented on LUCENE-4981: Adrien, can you hold off committing for a little bit? I'm not sure if QueryParser.setAutoGeneratePhraseQueries is sufficient for all cases that the PositionFilter hack addresses - I want to do some investigation. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650160#comment-13650160 ] Adrien Grand commented on LUCENE-4981: -- Sure I can wait. (Even when committed, the old behavior will still be available by using luceneMatchVersion=4.3). I would like to start marking all our broken components (the offenders in TestRandomChains) as deprecated so that people start thinking about ways to solve their problems without them, stop getting highlighting bugs and can eventually smoothly upgrade to 5.0 when we release it. I already started deprecating/fixing some tokenizers / token filters for 4.4 (LUCENE-4955 and LUCENE-4963) and would like to get as many of them fixed as possible for the next release. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4790) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error
[ https://issues.apache.org/jira/browse/SOLR-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4790: - Issue Type: Improvement (was: Bug) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error Key: SOLR-4790 URL: https://issues.apache.org/jira/browse/SOLR-4790 Project: Solr Issue Type: Improvement Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson When you define a core with the same name as another core (discovery mode definitely, old-style xml probably), last one wins. Which means it's very hard to track down what caused the problem. What's worse, the last-encountered core replaces the first one, leading to cores that change an unexpected index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650164#comment-13650164 ] Steve Rowe commented on LUCENE-4981: Thanks for working on fixing the broken stuff. In addition to use cases, I want to investigate the exact nature of the brokenness PositionFilter introduces - maybe it's fixable? I'll re-enable it in TestRandomChains and iterate until it breaks. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650166#comment-13650166 ] Robert Muir commented on LUCENE-4981: - I'm not sure its fixable: by definition it corrupts the structure because you lose all posincs. so synonyms no longer become synonyms, holes disappear, or whatever. and this doesnt even factor in posLength... Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650176#comment-13650176 ] Steve Rowe commented on LUCENE-4981: The comment in TestRandomChains says: {code:java} // TODO: corrumpts graphs (offset consistency check): PositionFilter.class, {code} which is what made me wonder what about the nature of brokenness: why are offsets a problem? I agree, Robert, PositionFilter corrupts by design. And if we do end up keeping it, position length should be addressed (it's not now), maybe by always setting it to 1. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650177#comment-13650177 ] Adrien Grand commented on LUCENE-4981: -- bq. why are offsets a problem? There are invariants that need to be maintained by token filters: all tokens that start at the same position must have the same start offset and all tokens that end at the same position (start position + position length) must have the same end offset (see ValidatingFilter). By arbitrarily changing position increments, PositionFilter breaks these invariants. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650191#comment-13650191 ] Robert Muir commented on LUCENE-4981: - {quote} which is what made me wonder what about the nature of brokenness: why are offsets a problem? {quote} I think Adrien describes it correctly: afaik it doesn't do anything super-evil like make start offsets go backwards or anything, but it breaks those invariants Adrien describes which can cause a follow-on-filter (e.g. shingle) to cause further craziness, e.g. things going backwards or endOffset startOffset or other problems. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4790) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error
[ https://issues.apache.org/jira/browse/SOLR-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4790: - Attachment: SOLR-4790.patch Gaah. Maintaining all the backwards junk is a pain, this is S much simpler than what was in there before. It only works for discovery mode, I'll take a quick look at what it would take to deal with old-style in a second. If it's too complicated I'll pass on it since old-style is going to end-of-life. Anyway, preliminary patch, I'm running the test suite now and have yet to look it over, but is this along the lines you [~markrmil...@gmail.com] had in mind? When defining a core with the same name (discovery mode or not), CoreContainer should throw an error Key: SOLR-4790 URL: https://issues.apache.org/jira/browse/SOLR-4790 Project: Solr Issue Type: Improvement Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4790.patch When you define a core with the same name as another core (discovery mode definitely, old-style xml probably), last one wins. Which means it's very hard to track down what caused the problem. What's worse, the last-encountered core replaces the first one, leading to cores that change an unexpected index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650193#comment-13650193 ] Steve Rowe commented on LUCENE-4981: Thanks for the pointer Adrien, I'll take a look at ValidatingFilter. It might be possible, by creating new positions, to enable offset consistency in PositionFilter. Not sure it's worth the effort though. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4981) Deprecate PositionFilter
[ https://issues.apache.org/jira/browse/LUCENE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650195#comment-13650195 ] Robert Muir commented on LUCENE-4981: - The Validatingfilter should be the same logic in BaseTokenStreamTestCase:196 I think its in a separate filter because then its applied at each stage of the analysis in TestRandomChains so if there is a bug in a complex analysis chain we know the culprit. Deprecate PositionFilter Key: LUCENE-4981 URL: https://issues.apache.org/jira/browse/LUCENE-4981 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4981.patch According to the documentation (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory), PositionFilter is mainly useful to make query parsers generate boolean queries instead of phrase queries although this problem can be solved at query parsing level instead of analysis level (eg. using QueryParser.setAutoGeneratePhraseQueries). So given that PositionFilter corrupts token graphs (see TestRandomChains), I propose to deprecate it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4790) When defining a core with the same name (discovery mode or not), CoreContainer should throw an error
[ https://issues.apache.org/jira/browse/SOLR-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650202#comment-13650202 ] Mark Miller commented on SOLR-4790: --- In this case, rather than a new back compat break, this is really an improvement. In the past, the only legit reason to do this was to reload a core - but now that's a broken way to reload - u must use the reload method. So failing is much better than what we do now IMO. When defining a core with the same name (discovery mode or not), CoreContainer should throw an error Key: SOLR-4790 URL: https://issues.apache.org/jira/browse/SOLR-4790 Project: Solr Issue Type: Improvement Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4790.patch When you define a core with the same name as another core (discovery mode definitely, old-style xml probably), last one wins. Which means it's very hard to track down what caused the problem. What's worse, the last-encountered core replaces the first one, leading to cores that change an unexpected index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org