[jira] Updated: (SOLR-486) Support binary formats for QueryresponseWriter
[ https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-486: Attachment: SOLR-486.patch This include changes for making Binary format the default for SolrJ and the changes for optimized write of Field names in Documents. So , for a response with 5 fields and 10 records only 5 names are written instead of 50. there is an overhead of an extra byte per unique string (total 5 bytes in this case) Support binary formats for QueryresponseWriter -- Key: SOLR-486 URL: https://issues.apache.org/jira/browse/SOLR-486 Project: Solr Issue Type: Improvement Components: clients - java, search Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 1.3 Attachments: SOLR-486.patch, solr-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch QueryResponse writer only allows text data to be written. So it is not possible to implement a binary protocol . Create another interface which has a method write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-563) Contrib area for Solr
[ https://issues.apache.org/jira/browse/SOLR-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605256#action_12605256 ] Shalin Shekhar Mangar commented on SOLR-563: Otis, we can work on the maven issue separately. I've tested the current patch both with and without the DataImportHandler contrib patches and it works fine. At the very least, it doesn't break any of the existing functionality. So, we should be ok to commit this. We can work on the maven issue separately as part of SOLR-586 once this gets committed. Contrib area for Solr - Key: SOLR-563 URL: https://issues.apache.org/jira/browse/SOLR-563 Project: Solr Issue Type: Task Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Otis Gospodnetic Fix For: 1.3 Attachments: SOLR-563.patch Add a contrib area for Solr and modify existing build.xml to build, package and distribute contrib projects also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-469) Data Import RequestHandler
[ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-469: --- Attachment: SOLR-469-contrib.patch *Changes* * Updated the build.xml to compile Solr before building DataImportHandler and place DataImportHandler's javadoc jar to solr/dist folder so that the javadocs are available in Solr nightly builds * Removed @author Javadoc tags from all source files in accordance with Solr coding conventions * Improved Javadocs for a lot of classes especially the public interfaces * Formatted code using the Eclipse codestyle xml given at HowToContribute wiki page * Added @since solr 1.3 to all source files * I've verified that the Apache license text is present in all the source files No changes have been made to the code (in terms of functionality) Note -- The SOLR-563 patch must be applied before this patch to build Solr with DataImportHandler as a contrib project. A lot of people are using this patch and it would be easier for them if DataImportHandler is available in the nightly builds. Also, this patch has become huge and enhancements and bug fixes would also be easier if it were committed. Grant -- We feel that this is ready to be committed now whenever you can take a look. Data Import RequestHandler -- Key: SOLR-469 URL: https://issues.apache.org/jira/browse/SOLR-469 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Noble Paul Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch We need a RequestHandler Which can import data from a DB or other dataSources into the Solr index .Think of it as an advanced form of SqlUpload Plugin (SOLR-103). The way it works is as follows. * Provide a configuration file (xml) to the Handler which takes in the necessary SQL queries and mappings to a solr schema - It also takes in a properties file for the data source configuraution * Given the configuration it can also generate the solr schema.xml * It is registered as a RequestHandler which can take two commands do-full-import, do-delta-import - do-full-import - dumps all the data from the Database into the index (based on the SQL query in configuration) - do-delta-import - dumps all the data that has changed since last import. (We assume a modified-timestamp column in tables) * It provides a admin page - where we can schedule it to be run automatically at regular intervals - It shows the status of the Handler (idle, full-import, delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605293#action_12605293 ] Grant Ingersoll commented on SOLR-572: -- OK, I'd like to commit this tomorrow or Wednesday. I am going to open another issue to bring in LUCENE-1297 to the configuration Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-486) Support binary formats for QueryresponseWriter
[ https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605306#action_12605306 ] Yonik Seeley commented on SOLR-486: --- Thanks Noble, this looks pretty good. I had previously considered caching strings via some kind of sliding window... keep track of the last 100 or so string values written under some certain length, and then if you see a string again in that window, write a reference (an index that says how many values ago it was seen). For Solr responses in general, it seems like the main duplication will be in field names (which you have taken care of). The only other duplication I can think of would be the id field values (used as a key in other maps such as highlighting), and any duplication that is custom to the collection (such as string values for a type field, etc). Thoughts? I'd be happy to commit this version, or give you time to try out an alternative if you think it might be worth it (but I don't currently have time myself to implement the alternative). Support binary formats for QueryresponseWriter -- Key: SOLR-486 URL: https://issues.apache.org/jira/browse/SOLR-486 Project: Solr Issue Type: Improvement Components: clients - java, search Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 1.3 Attachments: SOLR-486.patch, solr-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch QueryResponse writer only allows text data to be written. So it is not possible to implement a binary protocol . Create another interface which has a method write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-599) Lightweight SolrJ client
[ https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-599: --- Description: SolrJ provides a SolrServer implementation backed by commons-httpclient which introduces many dependency jars (commons-codec, commons-io and commons-logging). Apart from that SolrJ also uses StAX API for XML parsing which introduces dependencies like stax-api, stax and stax-utils. This enhancement will add a SolrServer implementation backed by java.net.HttpUrlConnection and will use BinaryResponseParser as the default response parser. Using this basic implementation out of the box would require no dependencies on either commons-httpclient or StAX. The only dependency would be on solr-commons and commons-logging making this a very lightweight and distribution friendly Java client for Solr. was: SolrJ provides a SolrServer implementation backed by commons-httpclient which introduces many dependency jars (commons-codec, commons-io and commons-logging). Apart from that SolrJ also uses StAX API for XML parsing which introduces dependencies like stax-api, stax and stax-utils. This enhancement will add a SolrServer implementation backed by java.net.HttpUrlConnection and will use BinaryResponseParser as the default response parser. Using this basic implementation out of the box would require no dependencies on either commons-httpclient or StAX. The only dependency would be on solr-commons making this a very lightweight and distribution friendly Java client for Solr. Lightweight SolrJ client Key: SOLR-599 URL: https://issues.apache.org/jira/browse/SOLR-599 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 1.3 SolrJ provides a SolrServer implementation backed by commons-httpclient which introduces many dependency jars (commons-codec, commons-io and commons-logging). Apart from that SolrJ also uses StAX API for XML parsing which introduces dependencies like stax-api, stax and stax-utils. This enhancement will add a SolrServer implementation backed by java.net.HttpUrlConnection and will use BinaryResponseParser as the default response parser. Using this basic implementation out of the box would require no dependencies on either commons-httpclient or StAX. The only dependency would be on solr-commons and commons-logging making this a very lightweight and distribution friendly Java client for Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-599) Lightweight SolrJ client
[ https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605325#action_12605325 ] Yonik Seeley commented on SOLR-599: --- I thought commons-logging was only a dependency because HTTPClient used it. Lightweight SolrJ client Key: SOLR-599 URL: https://issues.apache.org/jira/browse/SOLR-599 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 1.3 SolrJ provides a SolrServer implementation backed by commons-httpclient which introduces many dependency jars (commons-codec, commons-io and commons-logging). Apart from that SolrJ also uses StAX API for XML parsing which introduces dependencies like stax-api, stax and stax-utils. This enhancement will add a SolrServer implementation backed by java.net.HttpUrlConnection and will use BinaryResponseParser as the default response parser. Using this basic implementation out of the box would require no dependencies on either commons-httpclient or StAX. The only dependency would be on solr-commons and commons-logging making this a very lightweight and distribution friendly Java client for Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
protected QParser.parse() and subclasses
Hi everyone, I have the following class : package org.apache.solr.search; class MyQParser extends QParser { protected Query parse() { // do stuff Query q = subQuery(qs, QParserPlugin.DEFAULT_QTYPE).parse(); // do stuff } } As QParser.parse is protected and QParser.subQuery is public, everything works fine when I run parse() myself (through unit tests). But when I try to run it through a Solr server, I get : method: parse signature: ()Lorg/apache/lucene/search/Query;) Bad access to protected data java.lang.VerifyError Which is normal, the class loader of QParser and MyQParser are different (MyQParser is inside the lib/ directory). A public scope for the QParser.parse() would resolve this issue. What do you think about it? Julien
[jira] Commented: (SOLR-486) Support binary formats for QueryresponseWriter
[ https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605329#action_12605329 ] Noble Paul commented on SOLR-486: - Another level of efficiency can be brought in by preloading the string table with well known strings like responseHeader , QTime etc . That did not look very elegant to me. I guess this can go into trunk and give it enough time to 'settle' before the release. Support binary formats for QueryresponseWriter -- Key: SOLR-486 URL: https://issues.apache.org/jira/browse/SOLR-486 Project: Solr Issue Type: Improvement Components: clients - java, search Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 1.3 Attachments: SOLR-486.patch, solr-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch QueryResponse writer only allows text data to be written. So it is not possible to implement a binary protocol . Create another interface which has a method write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-486) Support binary formats for QueryresponseWriter
[ https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605126#action_12605126 ] noble.paul edited comment on SOLR-486 at 6/16/08 9:04 AM: -- If we take a look at the data that is written down by NamedListCodec there are a lot of names which are repeated. If we could avoid the repetitions we can achieve better optimization. Can we have another type EXTERN_STRING The NamedListCodec maintains a MapString,Integer of EXTERN_STRING vs index as it is written out. When the same string is written it checks up in the List whether it already has a reference. While decoding all the EXTERN_STRING values are copied into a List String. When an EXTERN_STRING with an index comes it is copied from the List. {code:title=NamedListCodec.java} private int stringsCount = 0; private MapString,Integer stringsMap; private ListString stringsList; public void writeExternString(String s) throws IOException { if(s == null) { writeTag(NULL) ; return; } Integer idx = stringsMap == null ? null : stringsMap.get(s); if(idx == null) idx =0; writeTag(EXTERN_STRING,idx); if(idx == 0){ writeStr(s); if(stringsMap == null) stringsMap = new HashMapString, Integer(); stringsMap.put(s,++stringsCount); } } public String readExternString(FastInputStream fis) throws IOException { int idx = readSize(fis); if (idx != 0) {// idx != 0 is the index of the extern string return stringsList.get(idx-1); } else {// idx == 0 means it has a string value String s = (String) readVal(fis); if(stringsList == null ) stringsList = new ArrayListString(); stringsList.add(s); return s; } } {code} was (Author: noble.paul): If we take a look at the data that is written down by NamedListCodec there are a lot of names which are repeated. If we could avoid the repetitions we can achieve better optimization. Can we have another type EXTERN_STRING The NamedListCodec maintains a MapString,Integer of EXTERN_STRING vs index as it is written out. When the same string is written it checks up in the List whether it already has a reference. While decoding all the EXTERN_STRING values are copied into a List String. When an EXTERN_STRING with an index comes it is copied from the List. {code:title=NamedListCodec.java} private int stringsCount = -1; private MapString,Integer stringsMap; private ListString stringsList; public void writeExternString(String s) throws IOException { writeTag(EXTERN_STRING); if(s == null) { writeTag(NULL) ; return; } if(stringsMap.containsKey(s)){ writeInt(stringsMap.get(s)); } else { writeStr(s); stringsCount++; if(stringsMap == null) stringsMap = new HashMapString, Integer(); stringsMap.put(s,stringsCount); } } public String readExternString(FastInputStream fis) throws IOException { Object o = readVal(fis); if(o == null) return null; if (o instanceof String) { String s = (String) o; if(stringsList == null ) stringsList = new ArrayListString(); stringsList.add(s); return s; } else {// this must be an integer int index = (Integer)o; return stringsList.get(index); } } {code} Support binary formats for QueryresponseWriter -- Key: SOLR-486 URL: https://issues.apache.org/jira/browse/SOLR-486 Project: Solr Issue Type: Improvement Components: clients - java, search Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 1.3 Attachments: SOLR-486.patch, solr-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch QueryResponse writer only allows text data to be written. So it is not possible to implement a binary protocol . Create another interface which has a method write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-599) Lightweight SolrJ client
[ https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605335#action_12605335 ] Shalin Shekhar Mangar commented on SOLR-599: Right, I'll edit the description (again) Lightweight SolrJ client Key: SOLR-599 URL: https://issues.apache.org/jira/browse/SOLR-599 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 1.3 SolrJ provides a SolrServer implementation backed by commons-httpclient which introduces many dependency jars (commons-codec, commons-io and commons-logging). Apart from that SolrJ also uses StAX API for XML parsing which introduces dependencies like stax-api, stax and stax-utils. This enhancement will add a SolrServer implementation backed by java.net.HttpUrlConnection and will use BinaryResponseParser as the default response parser. Using this basic implementation out of the box would require no dependencies on either commons-httpclient or StAX. The only dependency would be on solr-commons and commons-logging making this a very lightweight and distribution friendly Java client for Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-599) Lightweight SolrJ client
[ https://issues.apache.org/jira/browse/SOLR-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-599: --- Description: SolrJ provides a SolrServer implementation backed by commons-httpclient which introduces many dependency jars (commons-codec, commons-io and commons-logging). Apart from that SolrJ also uses StAX API for XML parsing which introduces dependencies like stax-api, stax and stax-utils. This enhancement will add a SolrServer implementation backed by java.net.HttpUrlConnection and will use BinaryResponseParser as the default response parser. Using this basic implementation out of the box would require no dependencies on either commons-httpclient or StAX. The only dependency would be on solr-commons making this a very lightweight and distribution friendly Java client for Solr. was: SolrJ provides a SolrServer implementation backed by commons-httpclient which introduces many dependency jars (commons-codec, commons-io and commons-logging). Apart from that SolrJ also uses StAX API for XML parsing which introduces dependencies like stax-api, stax and stax-utils. This enhancement will add a SolrServer implementation backed by java.net.HttpUrlConnection and will use BinaryResponseParser as the default response parser. Using this basic implementation out of the box would require no dependencies on either commons-httpclient or StAX. The only dependency would be on solr-commons and commons-logging making this a very lightweight and distribution friendly Java client for Solr. Lightweight SolrJ client Key: SOLR-599 URL: https://issues.apache.org/jira/browse/SOLR-599 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 1.3 SolrJ provides a SolrServer implementation backed by commons-httpclient which introduces many dependency jars (commons-codec, commons-io and commons-logging). Apart from that SolrJ also uses StAX API for XML parsing which introduces dependencies like stax-api, stax and stax-utils. This enhancement will add a SolrServer implementation backed by java.net.HttpUrlConnection and will use BinaryResponseParser as the default response parser. Using this basic implementation out of the box would require no dependencies on either commons-httpclient or StAX. The only dependency would be on solr-commons making this a very lightweight and distribution friendly Java client for Solr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-486) Support binary formats for QueryresponseWriter
[ https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605329#action_12605329 ] noble.paul edited comment on SOLR-486 at 6/16/08 9:13 AM: -- Another level of efficiency can be brought in by preloading the string table with well known strings like responseHeader , QTime etc . That did not look very elegant to me. The sliding window approach is also good . But we do not have too many repeated strings unless we use highlighting etc . I guess this can go into trunk and give it enough time to 'settle' before the release. was (Author: noble.paul): Another level of efficiency can be brought in by preloading the string table with well known strings like responseHeader , QTime etc . That did not look very elegant to me. I guess this can go into trunk and give it enough time to 'settle' before the release. Support binary formats for QueryresponseWriter -- Key: SOLR-486 URL: https://issues.apache.org/jira/browse/SOLR-486 Project: Solr Issue Type: Improvement Components: clients - java, search Reporter: Noble Paul Assignee: Yonik Seeley Fix For: 1.3 Attachments: SOLR-486.patch, solr-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch QueryResponse writer only allows text data to be written. So it is not possible to implement a binary protocol . Create another interface which has a method write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605367#action_12605367 ] Yonik Seeley commented on SOLR-572: --- For those who are just casually following this issue, is there a good summary of current input options and example output? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-572) Spell Checker as a Search Component
Grant created a wiki page at http://wiki.apache.org/solr/SpellCheckComponentwhich has some documentation on the configuration. I'll try to add more documentation when I try this out tomorrow. On Tue, Jun 17, 2008 at 12:03 AM, Yonik Seeley (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605367#action_12605367] Yonik Seeley commented on SOLR-572: --- For those who are just casually following this issue, is there a good summary of current input options and example output? Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. -- Regards, Shalin Shekhar Mangar.
[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605403#action_12605403 ] Mike Klaas commented on SOLR-14: Note that it is very easy to use an external TokenFilter, so you could just cp WDF into your own class and make the changes. (Though I'm not saying that this _shouldn't_ make it in for 1.3) Add the ability to preserve the original term when using WordDelimiterFilter Key: SOLR-14 URL: https://issues.apache.org/jira/browse/SOLR-14 Project: Solr Issue Type: Improvement Components: search Reporter: Richard Trey Hyde Attachments: TokenizerFactory.java, WordDelimiterFilter.patch, WordDelimiterFilter.patch When doing prefix searching, you need to hang on to the original term othewise you'll miss many matches you should be making. Data: ABC-12345 WordDelimiterFitler may change this into ABC 12345 ABC12345 A user may enter a search such as ABC\-123* Which will fail to find a match given the above scenario. The attached patch will allow the use of the preserveOriginal option to WordDelimiterFilter and will analyse as ABC 12345 ABC12345 ABC-12345 in which case we will get a postive match. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605410#action_12605410 ] Mike Klaas commented on SOLR-14: Also, voting for an issue is a good way to increase its visibility Add the ability to preserve the original term when using WordDelimiterFilter Key: SOLR-14 URL: https://issues.apache.org/jira/browse/SOLR-14 Project: Solr Issue Type: Improvement Components: search Reporter: Richard Trey Hyde Attachments: TokenizerFactory.java, WordDelimiterFilter.patch, WordDelimiterFilter.patch When doing prefix searching, you need to hang on to the original term othewise you'll miss many matches you should be making. Data: ABC-12345 WordDelimiterFitler may change this into ABC 12345 ABC12345 A user may enter a search such as ABC\-123* Which will fail to find a match given the above scenario. The attached patch will allow the use of the preserveOriginal option to WordDelimiterFilter and will analyse as ABC 12345 ABC12345 ABC-12345 in which case we will get a postive match. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-243) Create a hook to allow custom code to create custom IndexReaders
[ https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605424#action_12605424 ] Hoss Man commented on SOLR-243: --- bq. Hoss has marked the issue for 1.3, so it will be in the release. for the record, i marked it as 1.3 because itwould be nice to see this in 1.3 ... but as i said in my 2008-03-13 comment: we need unit tests and example configuration before i'm willing to commit. Create a hook to allow custom code to create custom IndexReaders Key: SOLR-243 URL: https://issues.apache.org/jira/browse/SOLR-243 Project: Solr Issue Type: Improvement Components: search Environment: Solr core Reporter: John Wang Assignee: Hoss Man Fix For: 1.3 Attachments: indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch I have a customized IndexReader and I want to write a Solr plugin to use my derived IndexReader implementation. Currently IndexReader instantiation is hard coded to be: IndexReader.open(path) It would be really useful if this is done thru a plugable factory that can be configured, e.g. IndexReaderFactory interface IndexReaderFactory{ IndexReader newReader(String name,String path); } the default implementation would just return: IndexReader.open(path) And in the newSearcher and getSearcher methods in SolrCore class can call the current factory implementation to get the IndexReader instance and then build the SolrIndexSearcher by passing in the reader. It would be really nice to add this improvement soon (This seems to be a trivial addition) as our project really depends on this. Thanks -John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-243) Create a hook to allow custom code to create custom IndexReaders
[ https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605447#action_12605447 ] John Wang commented on SOLR-243: Sorry, I didn't see Hoss's earlier comments. Thanks -John Create a hook to allow custom code to create custom IndexReaders Key: SOLR-243 URL: https://issues.apache.org/jira/browse/SOLR-243 Project: Solr Issue Type: Improvement Components: search Environment: Solr core Reporter: John Wang Assignee: Hoss Man Fix For: 1.3 Attachments: indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch I have a customized IndexReader and I want to write a Solr plugin to use my derived IndexReader implementation. Currently IndexReader instantiation is hard coded to be: IndexReader.open(path) It would be really useful if this is done thru a plugable factory that can be configured, e.g. IndexReaderFactory interface IndexReaderFactory{ IndexReader newReader(String name,String path); } the default implementation would just return: IndexReader.open(path) And in the newSearcher and getSearcher methods in SolrCore class can call the current factory implementation to get the IndexReader instance and then build the SolrIndexSearcher by passing in the reader. It would be really nice to add this improvement soon (This seems to be a trivial addition) as our project really depends on this. Thanks -John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: protected QParser.parse() and subclasses
: As QParser.parse is protected and QParser.subQuery is public, everything : works fine when I run parse() myself (through unit tests). But when I : try to run it through a Solr server, I get : all of the concrete impls of QParser in the solr code base declare the parse() method as public ... i'm not sure why it's protected in the abstract class ... seems wrong to me. -Hoss
[jira] Created: (SOLR-600) XML parser stops working under heavy load
XML parser stops working under heavy load - Key: SOLR-600 URL: https://issues.apache.org/jira/browse/SOLR-600 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.3 Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux Tomcat 6.0.16 SOLR nightly 16 Jun 2008, and versions prior JRE 1.6.0 Reporter: John Smith Under heavy load, the following is spat out for every update: org.apache.solr.common.SolrException log SEVERE: java.lang.NullPointerException at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125) at org.apache.solr.core.SolrCore.execute(SolrCore.java:965) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:735) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.