[jira] Commented: (SOLR-341) PHP Solr Client
[ https://issues.apache.org/jira/browse/SOLR-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650781#action_12650781 ] Pieter Berkel commented on SOLR-341: Thanks for the quick response, just wanted to check that you uploaded the updated class files? I couldn't find the new setBoost() / setField() / setFieldBoost() methods in the Document class located in SolrPhpClient.2008-11-24.zip PHP Solr Client --- Key: SOLR-341 URL: https://issues.apache.org/jira/browse/SOLR-341 Project: Solr Issue Type: New Feature Components: clients - php Affects Versions: 1.2 Environment: PHP = 5.2.0 (or older with JSON PECL extension or other json_decode function implementation). Solr = 1.2 Reporter: Donovan Jimenez Priority: Trivial Fix For: 1.4 Attachments: SolrPhpClient.2008-09-02.zip, SolrPhpClient.2008-11-14.zip, SolrPhpClient.2008-11-24.zip, SolrPhpClient.zip Developed this client when the example PHP source didn't meet our needs. The company I work for agreed to release it under the terms of the Apache License. This version is slightly different from what I originally linked to on the dev mailing list. I've incorporated feedback from Yonik and hossman to simplify the client and only accept one response format (JSON currently). When Solr 1.3 is released the client can be updated to use the PHP or Serialized PHP response writer. example usage from my original mailing list post: ?php require_once('Solr/Service.php'); $start = microtime(true); $solr = new Solr_Service(); //Or explicitly new Solr_Service('localhost', 8180, '/solr'); try { $response = $solr-search('solr', 0, 10, array(/* you can include other parameters here */)); echo 'search returned with status = ', $response-responseHeader-status, ' and took ', microtime(true) - $start, ' seconds', \n; //here's how you would access results //Notice that I've mapped the values by name into a tree of stdClass objects //and arrays (actually, most of this is done by json_decode ) if ($response-response-numFound 0) { $doc_number = $response-response-start; foreach ($response-response-docs as $doc) { $doc_number++; echo $doc_number, ': ', $doc-text, \n; } } //for the purposes of seeing the available structure of the response //NOTE: Solr_Response::_parsedData is lazy loaded, so a print_r on the response before //any values are accessed may result in different behavior (in case //anyone has some troubles debugging) //print_r($response); } catch (Exception $e) { echo $e-getMessage(), \n; } ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-341) PHP Solr Client
[ https://issues.apache.org/jira/browse/SOLR-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650095#action_12650095 ] Pieter Berkel commented on SOLR-341: Hi Donovan, Great work on the PHP client library, however I noticed that there is no way to specify document- and/or field-level boost values when creating and indexing documents: http://wiki.apache.org/solr/UpdateXmlMessages Perhaps Apache_Solr_Document could have a constructor method with an optional parameter for setting the document boost: public function __construct($boost = '1.0') { Not so sure how the field-level boost should be set, maybe add methods setFieldBoost($key) and getFieldBoost($key) to Apache_Solr_Document? If necessary I can also submit code patches for these changes. cheers, Piete PHP Solr Client --- Key: SOLR-341 URL: https://issues.apache.org/jira/browse/SOLR-341 Project: Solr Issue Type: New Feature Components: clients - php Affects Versions: 1.2 Environment: PHP = 5.2.0 (or older with JSON PECL extension or other json_decode function implementation). Solr = 1.2 Reporter: Donovan Jimenez Priority: Trivial Fix For: 1.4 Attachments: SolrPhpClient.2008-09-02.zip, SolrPhpClient.2008-11-14.zip, SolrPhpClient.zip Developed this client when the example PHP source didn't meet our needs. The company I work for agreed to release it under the terms of the Apache License. This version is slightly different from what I originally linked to on the dev mailing list. I've incorporated feedback from Yonik and hossman to simplify the client and only accept one response format (JSON currently). When Solr 1.3 is released the client can be updated to use the PHP or Serialized PHP response writer. example usage from my original mailing list post: ?php require_once('Solr/Service.php'); $start = microtime(true); $solr = new Solr_Service(); //Or explicitly new Solr_Service('localhost', 8180, '/solr'); try { $response = $solr-search('solr', 0, 10, array(/* you can include other parameters here */)); echo 'search returned with status = ', $response-responseHeader-status, ' and took ', microtime(true) - $start, ' seconds', \n; //here's how you would access results //Notice that I've mapped the values by name into a tree of stdClass objects //and arrays (actually, most of this is done by json_decode ) if ($response-response-numFound 0) { $doc_number = $response-response-start; foreach ($response-response-docs as $doc) { $doc_number++; echo $doc_number, ': ', $doc-text, \n; } } //for the purposes of seeing the available structure of the response //NOTE: Solr_Response::_parsedData is lazy loaded, so a print_r on the response before //any values are accessed may result in different behavior (in case //anyone has some troubles debugging) //print_r($response); } catch (Exception $e) { echo $e-getMessage(), \n; } ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-379) KStem Token Filter
[ https://issues.apache.org/jira/browse/SOLR-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605169#action_12605169 ] Pieter Berkel commented on SOLR-379: As far as I'm aware KStemFilterFactory.java was written by Harry Wagner so if he's happy to grant ASL it should be possible to include that in the repo. Everything in /src/java/org/apache/lucene/analysis has been copied from KStem.jar which was originally downloaded from CIIR, so if that can possibly be loaded on demand, then it should be fairly straightforward to include support for this stemmer in Solr. KStem Token Filter -- Key: SOLR-379 URL: https://issues.apache.org/jira/browse/SOLR-379 Project: Solr Issue Type: New Feature Components: search Reporter: Pieter Berkel Priority: Minor Attachments: KStemSolr.zip A Lucene / Solr implementation of the KStem stemmer. Full credit goes to Harry Wagner for adapting the Lucene version found here: http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi Background discussion to this stemmer (including licensing issues) can be found in this thread: http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295 I've made some minor changes to KStemFilterFactory so that it compiles cleanly against trunk: 1) removed some unnecessary imports 2) changed the init() method parameters introduced by SOLR-215 3) moved KStemFilterFactory into package org.apache.solr.analysis Once compiled and included in your Solr war (or as a jar in your lib directory, the KStem filter can be used in your schema very easily: analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory cacheSize=2/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a structured document.
[ https://issues.apache.org/jira/browse/SOLR-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535426 ] Pieter Berkel commented on SOLR-380: There was a recent discussion surrounding a similar problem on solr-user: http://www.nabble.com/Structured-Lucene-documents-tf4234661.html#a12048390 The idea was to use dynamic fields (e.g. page_1, page_2, page_3... page_N) to store the text of each page in a single document. The problem is that currently Solr does not support glob style field expansion in query parameters (e.g. qf=page_* ) so you would end up having to specify the entire list of page fields in your query, which is impractical. There is already an open issue related to this particular problem (SOLR-247) but nobody has had time to look into it. In terms of returning term position information, this seems somehow (albeit loosely) related to highlighting, is there any way you could use the existing functionality to achieve your goal? (definitely would be a hack though) There's no way to convert search results into page-level hits of a structured document. - Key: SOLR-380 URL: https://issues.apache.org/jira/browse/SOLR-380 Project: Solr Issue Type: New Feature Components: search Reporter: Tricia Williams Priority: Minor Paged-Text FieldType for Solr A chance to dig into the guts of Solr. The problem: If we index a monograph in Solr, there's no way to convert search results into page-level hits. The solution: have a paged-text fieldtype which keeps track of page divisions as it indexes, and reports page-level hits in the search results. The input would contain page milestones: page id=234/. As Solr processed the tokens (using its standard tokenizers and filters), it would concurrently build a structural map of the item, indicating which term position marked the beginning of which page: page id=234 firstterm=14324/. This map would be stored in an unindexed field in some efficient format. At search time, Solr would retrieve term positions for all hits that are returned in the current request, and use the stored map to determine page ids for each term position. The results would imitate the results for highlighting, something like: lst name=pages nbsp;nbsp;lst name=doc1 nbsp;nbsp;nbsp;nbsp;int name=pageid234/int nbsp;nbsp;nbsp;nbsp;int name=pageid236/int nbsp;nbsp;/lst nbsp;nbsp;lst name=doc2 nbsp;nbsp;nbsp;nbsp;int name=pageid19/int nbsp;nbsp;/lst /lst lst name=hitpos nbsp;nbsp;lst name=doc1 nbsp;nbsp;nbsp;nbsp;lst name=234 nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;int name=pos14325/int nbsp;nbsp;nbsp;nbsp;/lst nbsp;nbsp;/lst nbsp;nbsp;... /lst -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-377) speed increase for writers
[ https://issues.apache.org/jira/browse/SOLR-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Berkel updated SOLR-377: --- Attachment: SOLR-377-phpresponsewriter.patch Sorry I've been a bit slow catching up with this issue. Please find attached a trival patch to PHPResponseWriter.java that takes advantage of the new FastWriter code, it should provide speed improvements similar to the JSON writer (perhaps slightly less). No fastwriter optimisation is necessary for PHPSerializedResponseWriter as there is no need to escape strings before they are written. speed increase for writers -- Key: SOLR-377 URL: https://issues.apache.org/jira/browse/SOLR-377 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Attachments: fastwriter.patch, SOLR-377-phpresponsewriter.patch When solr is writing the response of large cached documents, the bottleneck is string encoding. a buffered writer implementation that doesn't do any synchronization could offer some good speedups. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-379) KStem Token Filter
KStem Token Filter -- Key: SOLR-379 URL: https://issues.apache.org/jira/browse/SOLR-379 Project: Solr Issue Type: New Feature Components: search Reporter: Pieter Berkel Priority: Minor A Lucene / Solr implementation of the KStem stemmer. Full credit goes to Harry Wagner for adapting the Lucene version found here: http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi Background discussion to this stemmer (including licensing issues) can be found in this thread: http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295 I've made some minor changes to KStemFilterFactory so that it compiles cleanly against trunk: 1) removed some unnecessary imports 2) changed the init() method parameters introduced by SOLR-215 3) moved KStemFilterFactory into package org.apache.solr.analysis Once compiled and included in your Solr war (or as a jar in your lib directory, the KStem filter can be used in your schema very easily: analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory cacheSize=2/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-379) KStem Token Filter
[ https://issues.apache.org/jira/browse/SOLR-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Berkel updated SOLR-379: --- Attachment: KStemSolr.zip I've attached a zip file containing the KStem source rather than a patch as I'm not sure how this code will be eventually integrated with Solr. Since I did not write this and am unsure of the legal status of this code, I have not granted ASF license, although recent discussion suggests the license included with KStem is compatible with the Apache license. Hopefully we'll be able to resolve these above issues fairly quickly. KStem Token Filter -- Key: SOLR-379 URL: https://issues.apache.org/jira/browse/SOLR-379 Project: Solr Issue Type: New Feature Components: search Reporter: Pieter Berkel Priority: Minor Attachments: KStemSolr.zip A Lucene / Solr implementation of the KStem stemmer. Full credit goes to Harry Wagner for adapting the Lucene version found here: http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi Background discussion to this stemmer (including licensing issues) can be found in this thread: http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295 I've made some minor changes to KStemFilterFactory so that it compiles cleanly against trunk: 1) removed some unnecessary imports 2) changed the init() method parameters introduced by SOLR-215 3) moved KStemFilterFactory into package org.apache.solr.analysis Once compiled and included in your Solr war (or as a jar in your lib directory, the KStem filter can be used in your schema very easily: analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory cacheSize=2/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-281) Search Components (plugins)
[ https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527931 ] Pieter Berkel commented on SOLR-281: I'm having trouble applying the latest patch to trunk (r575809) again: $ patch -p0 ../SOLR-281-SearchComponents.patch ... patching file src/java/org/apache/solr/handler/StandardRequestHandler.java Hunk #1 FAILED at 17. Hunk #2 FAILED at 45. 2 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/solr/handler/StandardRequestHandler.java.rej patching file src/java/org/apache/solr/handler/DisMaxRequestHandler.java Hunk #2 FAILED at 118. 1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/solr/handler/DisMaxRequestHandler.java.rej It also looks like the additions to solrconfig.xml have not been included in the latest patch either. I was also going to suggest that it might be a good idea to support class shorthand notation, so org.apache.solr.handler.component.* can be written solr.component.* in solrconfig.xml. Search Components (plugins) --- Key: SOLR-281 URL: https://issues.apache.org/jira/browse/SOLR-281 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Attachments: SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch A request handler with pluggable search components for things like: - standard - dismax - more-like-this - highlighting - field collapsing For more discussion, see: http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)
[ https://issues.apache.org/jira/browse/SOLR-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522345 ] Pieter Berkel commented on SOLR-247: Some recent discussion on this topic: http://www.nabble.com/Structured-Lucene-documents-tf4234661.html I get the impression that general wildcard syntax support for field listing parameters (i.e. the reverse of dynamic fields) as described in the above thread would be far more useful than a simple '*' match-anything syntax (not only in faceting but other cases like hl.fl and perhaps even mlt.fl). I haven't really considered the performance issues of this approach however, as it would involve checking each field supplied in the parameter for '*' before expanding it into full field names for every query. Given the above, the fact that it could be used across multiple response handlers and subhandlers like SimpleFacets Highlighting, and that it would require access to IndexReader to getFieldNames(), where might be the most sensible place to put this code? Allow facet.field=* to facet on all fields (without knowing what they are) -- Key: SOLR-247 URL: https://issues.apache.org/jira/browse/SOLR-247 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-247-FacetAllFields.patch I don't know if this is a good idea to include -- it is potentially a bad idea to use it, but that can be ok. This came out of trying to use faceting for the LukeRequestHandler top term collecting. http://www.nabble.com/Luke-request-handler-issue-tf3762155.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-281) Search Components (plugins)
[ https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520434 ] Pieter Berkel commented on SOLR-281: I just tried this patch on svn trunk (r566899) and got the following failures: $ patch -p0 ../SOLR-281-SearchComponents.patch ... patching file src/java/org/apache/solr/handler/StandardRequestHandler.java Hunk #1 succeeded at 17 with fuzz 1. Hunk #2 FAILED at 45. 1 out of 2 hunks FAILED -- saving rejects to file src/java/org/apache/solr/handler/StandardRequestHandler.java.rej ... patching file src/java/org/apache/solr/handler/DisMaxRequestHandler.java Hunk #1 FAILED at 17. 1 out of 1 hunk FAILED -- saving rejects to file src/java/org/apache/solr/handler/DisMaxRequestHandler.java.rej I suspect it is the changes made by SOLR-326 that is causing the these problems, would it be possible for you to create a new patch? thanks, Piete Search Components (plugins) --- Key: SOLR-281 URL: https://issues.apache.org/jira/browse/SOLR-281 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Attachments: SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch A request handler with pluggable search components for things like: - standard - dismax - more-like-this - highlighting - field collapsing For more discussion, see: http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-196) A PHP response writer for Solr
[ https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519195 ] Pieter Berkel commented on SOLR-196: Great! I'll try to add some documentation to the wiki in the next few days. Regarding the content-type, I found it more useful to be able to actually see the result in a browser. Is there a content-type we can use for JSON that can achieve both goals for firefox and IE at least? I couldn't find any suitable mime types that would achieve this goal so it's probably better to leave the content-types unchanged for the moment. A PHP response writer for Solr -- Key: SOLR-196 URL: https://issues.apache.org/jira/browse/SOLR-196 Project: Solr Issue Type: New Feature Components: clients - php, search Reporter: Paul Borgermans Attachments: SOLR-192-php-responsewriter.patch, SOLR-196-PHPResponseWriter.patch It would be useful to have a PHP response writer that returns an array to be eval-ed directly. This is especially true for PHP4.x installs, where there is no built in support for JSON. This issue attempts to address this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-196) A PHP response writer for Solr
[ https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519196 ] Pieter Berkel commented on SOLR-196: Hmm, it doesn't look like the two new files from the patch were added properly during the latest commit: /src/java/org/apache/solr/request/PHPResponseWriter.java /src/java/org/apache/solr/request/PHPSerializedResponseWriter.java We won't get very far without those! A PHP response writer for Solr -- Key: SOLR-196 URL: https://issues.apache.org/jira/browse/SOLR-196 Project: Solr Issue Type: New Feature Components: clients - php, search Reporter: Paul Borgermans Attachments: SOLR-192-php-responsewriter.patch, SOLR-196-PHPResponseWriter.patch It would be useful to have a PHP response writer that returns an array to be eval-ed directly. This is especially true for PHP4.x installs, where there is no built in support for JSON. This issue attempts to address this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-196) A PHP response writer for Solr
[ https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Berkel updated SOLR-196: --- Attachment: SOLR-196-PHPResponseWriter.patch This patch updates the PHPResponseWriter original written by Paul Borgermans and integrates the serialized PHP response writer (renamed to PHPSerializedResponseWriter to avoid name clashes) originally authored by Nick Jenkin in SOLR-275. See http://www.nabble.com/PHP-Response-Writer-for-Solr-tf4140580.html for some discussion on this implementation. I've made minimal code changes to JSONwriter in order to reducing the amount of code-duplication, specifically replacing all static writes of array and map structure tokens with methods: public void writeMapOpener(int size) throws IOException, IllegalArgumentException { writer.write('{'); } public void writeMapSeparator() throws IOException { writer.write(','); } public void writeMapCloser() throws IOException { writer.write('}'); } public void writeArrayOpener(int size) throws IOException, IllegalArgumentException { writer.write('['); } public void writeArraySeparator() throws IOException { writer.write(','); } public void writeArrayCloser() throws IOException { writer.write(']'); } The size parameter has been introduced specifically for PHPSerializedWriter (where the output format explicitly requires the size of the array / map to be set) and is currently ignored by all other response writers. In cases where the size is not trivial to calculate (e.g. an Iterable object), it is set to -1. Classes extending JSONwriter that require a valid (non-negative) size value must overload certain methods (i.e. writeArray() and writeDoc()) to calculate size correctly. It would also be a good idea to check for invalid size values in writeMapOpener() and writeArrayOpener() and throw a IllegalArgumentException if so. Some other changes I've made to the PHPWriter code from SOLR-196: 1) Removed a lot of code duplicated from JSONwriter. 2) Updated writeStr() to use StringBuilder. Some other changes I've made to the PHPSerializedWriter code from SOLR-275: 1) Removed some uneccessary duplicate code. 2) Changed key type written by writeArray() from String to int (since they are suppose to be numeric indicies). 3) Updated writeStr() - serialized php strings don't need to be escaped (it seems to rely only on the specified string size value) and size needs be specified in bytes not characters (some Unicode characters were causing problems when using String.length() to calculate size, can someone please sanity check this code?). I've tested both PHPWriter PHPSerializedWriter and they both seem to output valid PHP data, it would be great if people could also test them to ensure they work in their environments. JSONWriter also seems to be fine and although I didn't test the Python or Ruby writers, I assume they are unaffected (can anyone confirm?). Additionally, I've moved PythonWriter and RubyWriter from JSONResponseWriter.java to PythonResponseWriter.java and RubyResponseWriter.java respectively. I noticed that while each Writer specifies a content type value (e.g. CONTENT_TYPE_JSON_UTF8, CONTENT_TYPE_PYTHON_ASCII) the value returned by getContentType() is generally CONTENT_TYPE_TEXT_UTF8 or CONTENT_TYPE_TEXT_ASCII. This is not a big deal and I guessed this allows the output to be easily displayed in a browser, however it would be quite useful to have the actual content type value set so that client applications can determine the response format encoding and process it accordingly without relying on access to the original wt query paramater. A PHP response writer for Solr -- Key: SOLR-196 URL: https://issues.apache.org/jira/browse/SOLR-196 Project: Solr Issue Type: New Feature Components: clients - php, search Reporter: Paul Borgermans Attachments: SOLR-192-php-responsewriter.patch, SOLR-196-PHPResponseWriter.patch It would be useful to have a PHP response writer that returns an array to be eval-ed directly. This is especially true for PHP4.x installs, where there is no built in support for JSON. This issue attempts to address this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-301) Clean up param interface. Leave deprecated options in deprecated classes
[ https://issues.apache.org/jira/browse/SOLR-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516839 ] Pieter Berkel commented on SOLR-301: While you're in the process of cleaning up the Params interfaces, I wonder if it worthwhile moving MoreLikeThisParams from o.a.s.common.util to o.a.s.common.params at the same time? I made a note of this in my comments on a href=http://issues.apache.org/jira/browse/SOLR-295;SOLR-295/a. Clean up param interface. Leave deprecated options in deprecated classes - Key: SOLR-301 URL: https://issues.apache.org/jira/browse/SOLR-301 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Fix For: 1.3 Attachments: SOLR-301-ParamCleanup.patch, SOLR-301-ParamCleanup.patch In SOLR-135, we moved the parameter handling stuff to a new package: o.a.s.common.params and left @deprecated classes in the old location. Classes in the new package should not contain any deprecated options. Aditionally, we should aim to seperate parameter manipulation logic (DefaultSolrParams, AppendedSolrParams, etc) from 'parameter' interface classes: 'HighlightParams', 'UpdateParams' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-301) Clean up param interface. Leave deprecated options in deprecated classes
[ https://issues.apache.org/jira/browse/SOLR-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516839 ] Pieter Berkel edited comment on SOLR-301 at 7/31/07 5:39 PM: - While you're in the process of cleaning up the Params interfaces, I wonder if it worthwhile moving MoreLikeThisParams from o.a.s.common.util to o.a.s.common.params at the same time? I made a note of this in my comments on SOLR-295. was: While you're in the process of cleaning up the Params interfaces, I wonder if it worthwhile moving MoreLikeThisParams from o.a.s.common.util to o.a.s.common.params at the same time? I made a note of this in my comments on a href=http://issues.apache.org/jira/browse/SOLR-295;SOLR-295/a. Clean up param interface. Leave deprecated options in deprecated classes - Key: SOLR-301 URL: https://issues.apache.org/jira/browse/SOLR-301 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Fix For: 1.3 Attachments: SOLR-301-ParamCleanup.patch, SOLR-301-ParamCleanup.patch In SOLR-135, we moved the parameter handling stuff to a new package: o.a.s.common.params and left @deprecated classes in the old location. Classes in the new package should not contain any deprecated options. Aditionally, we should aim to seperate parameter manipulation logic (DefaultSolrParams, AppendedSolrParams, etc) from 'parameter' interface classes: 'HighlightParams', 'UpdateParams' -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515208 ] Pieter Berkel commented on SOLR-258: Looking good Hoss, the NOW issue seems to be resolved and the results look consistent after a quick test. * what should happen if end start or gap 0 ... maybe those should be okay as long as both are true. It is probably wise to explicitly check for (end start XOR gap 0) and return an error if so, otherwise the request gets caught in an infinite loop. Just on the subject of errors, I notice that exceptions thrown by the date facet code are caught in SimpleFacets.getFacetCounts() and written out in the response: try { res.add(facet_queries, getFacetQueryCounts()); res.add(facet_fields, getFacetFieldCounts()); res.add(facet_dates, getFacetDateCounts()); } catch (Exception e) { SolrException.logOnce(SolrCore.log, Exception during facet counts, e); res.add(exception, SolrException.toStr(e)); } This doesn't seem very consistent the way other handlers deal with exceptions (i.e. http response code 400), is there any reason why it is done this way in SimpleFacets? I also think it would also be a good idea to merge facet_dates response field into facet_fields so that all the facet data in the response is stored in the one location, how feasible would it be to do this? Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index
[ https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513891 ] Pieter Berkel commented on SOLR-308: From the usage case you have provided, it sounds like the unique id will change every time you delete and re-insert the document. If this is the case, then perhaps it might be more efficient to use the lucene document id as your unique id value rather than a seperate field? However, as far as I'm aware, there currently isn't any way to access the lucene doc id from solr (except perhaps the luke request handler)? Add a field that generates an unique id when you have none in your data to index Key: SOLR-308 URL: https://issues.apache.org/jira/browse/SOLR-308 Project: Solr Issue Type: New Feature Components: search Reporter: Thomas Peuss Priority: Minor Attachments: GeneratedId.patch This patch adds a field that generates an unique id when you have no unique id in your data you want to index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512778 ] Pieter Berkel commented on SOLR-258: Sorry that last comment was from me, not posted from my regular computer. I'll be more careful to post as myself and not as a colleague in future (I was wondering why JIRA didn't ask me to login, d'oh). Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512778 ] Pieter Berkel edited comment on SOLR-258 at 7/14/07 11:59 PM: -- Sorry that last comment was from me (not Tristan), not posted from my regular computer. I'll be more careful to post as myself and not as a colleague in future (I was wondering why JIRA didn't ask me to login, d'oh). was: Sorry that last comment was from me, not posted from my regular computer. I'll be more careful to post as myself and not as a colleague in future (I was wondering why JIRA didn't ask me to login, d'oh). Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-258) Date based Facets
[ https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512372 ] Pieter Berkel commented on SOLR-258: I've just tried this patch and the results are impressive! I agree with Ryan regarding the naming of 'pre', 'post' and 'inner', using simple concrete words will make it easier for developers to understand the basic concepts. At first I was a little confused how the 'gap' parameter was used, perhaps a name like 'interval' would be more indicative of it's purpose? While on the topic of gaps / intervals, I can imagine a case where one might want facet counts over non-linear intervals, for instance obtaining results from: Last 7 days, Last 30 days, Last 90 days, Last 6 months. Obviously you can achieve this by setting facet.date.gap=+1DAY and then post-process the results, but a much more elegant solution would be to allow facet.date.gap (or another suitably named param) to accept a (comma-delimited) set of explicit partition dates: facet.date.start=NOW-6MONTHS/DAY facet.date.end=NOW/DAY facet.date.gap=NOW-90DAYS/DAY,NOW-30DAYS/DAY,NOW-7DAYS/DAY It would then be trivial to calculate facet counts for the ranges specified above. It would be useful to make the 'start' an 'end' parameters optional. If not specified 'start' should default to the earliest stored date value, and 'end' should default to the latest stored date value (assuming that's possible). Probably should return a 400 if 'gap' is not set. My personal opinion is that 'end' should be a hard limit, the last gap should never go past 'end'. Given that the facet label is always generated from the lower value in the range, I don't think truncating the last 'gap' will cause problems, however it may be helpful to return the actual date value for end if it was specified as a offset of NOW. What might be a problem is when both start and end dates are specified as offsets of NOW, the value of NOW may not be constant for both values. In one of my tests, I set: facet.date.start=NOW-12MONTHS facet.date.end=NOW facet.date.gap=+1MONTH With some extra debugging output I can see that mostly the value of NOW is the same: str name=start2006-07-13T06:06:07.397/str str name=end2007-07-13T06:06:07.397/str However occasionally there is a difference: str name=start2006-07-13T05:48:23.014/str str name=end2007-07-13T05:48:23.015/str This difference alters the number of gaps calculated (+1 when NOW values are diff for start end). Not sure how this could be fixed, but as you mentioned above, it will probably involve changing ft.toExternal(ft.toInternal(...)). Thanks again for creating this useful addition, I'll try to test it a bit more and see if I can find anything else. Date based Facets - Key: SOLR-258 URL: https://issues.apache.org/jira/browse/SOLR-258 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch 1) Allow clients to express concepts like... * give me facet counts per day for every day this month. * give me facet counts per hour for every hour of today. * give me facet counts per hour for every hour of a specific day. * give me facet counts per hour for every hour of a specific day and give me facet counts for the number of matches before that day, or after that day. 2) Return all data in a way that makes it easy to use to build filter queries on those date ranges. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-281) Search Components (plugins)
[ https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511678 ] Pieter Berkel commented on SOLR-281: I really like this modular approach to handling search requests, it will greatly simplify the process of adding new functionality (e.g. collapsing, faceting, more-like-this) to existing handlers without the need for unnecessary code replication. My primary goal is to extend the more-like-this handler capabilities and make them available to other handlers (such as dismax), and I think the proposed solution is a good approach. Some issues that I can forsee though are: 1) Ordering: its fairly obvious that certain handlers need to be called before others (e.g. standard / dismax query parsing before faceting / highlighting) however there may be cases where the required sequence of events may be more subtle (e.g. faceting the results of a more-like-this query). There probably needs to be some mechanism to determine the order in which the components are prepared / processed. 2) Dependancy: a situation may arise where a component depends on operations performed by another component (e.g. more-like-this may take advantage of the dismax 'bq' parameter), perhaps there needs to be some method of specifying component dependency so that the SearchHandler can load and process required components automatically? I hope this make sense, I'm fairly new to Solr development so I'm afraid my contributions to this issue would be mostly limited to (hopefully helpful) ideas and suggestions however I'm happy to tinker with the patched code from above and help test this new component framework as it is developed. cheers, Pieter Search Components (plugins) --- Key: SOLR-281 URL: https://issues.apache.org/jira/browse/SOLR-281 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Attachments: SOLR-281-SearchComponents.patch A request handler with pluggable search components for things like: - standard - dismax - more-like-this - highlighting - field collapsing For more discussion, see: http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-292) MoreLikeThisHandler generates incorrect facet counts
MoreLikeThisHandler generates incorrect facet counts Key: SOLR-292 URL: https://issues.apache.org/jira/browse/SOLR-292 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Pieter Berkel Priority: Minor Fix For: 1.3 When obtaining facet counts using the MoreLikeThis handler, the facet information returned is generated from the document list returned rather than the entire set of matching documents. For example, if your MoreLikeThis query returns by default 10 documents, then getFacetCounts() returns values based only on these 10 documents, despite the fact that there may be thousands of matching documents in the set. The soon-to-be uploaded patch addresses this particular issue by changing the object type returned by MoreLikeThisHelper.getMoreLikeThis() from DocList to DocListAndSet and ensuring that the facet count is generated from the entire set rather than the document list. The MLT functionality of the StandardRequestHandler should not be affected by this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-292) MoreLikeThisHandler generates incorrect facet counts
[ https://issues.apache.org/jira/browse/SOLR-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Berkel updated SOLR-292: --- Attachment: MoreLikeThis-FacetCount_SOLR-292.patch Patch updates src/java/org/apache/solr/handler/MoreLikeThisHandler.java and fixes the facet count problem. MoreLikeThisHandler generates incorrect facet counts Key: SOLR-292 URL: https://issues.apache.org/jira/browse/SOLR-292 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Pieter Berkel Priority: Minor Fix For: 1.3 Attachments: MoreLikeThis-FacetCount_SOLR-292.patch When obtaining facet counts using the MoreLikeThis handler, the facet information returned is generated from the document list returned rather than the entire set of matching documents. For example, if your MoreLikeThis query returns by default 10 documents, then getFacetCounts() returns values based only on these 10 documents, despite the fact that there may be thousands of matching documents in the set. The soon-to-be uploaded patch addresses this particular issue by changing the object type returned by MoreLikeThisHelper.getMoreLikeThis() from DocList to DocListAndSet and ensuring that the facet count is generated from the entire set rather than the document list. The MLT functionality of the StandardRequestHandler should not be affected by this change. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-295) Implementing MoreLikeThis support in DismaxRequestHandler
Implementing MoreLikeThis support in DismaxRequestHandler - Key: SOLR-295 URL: https://issues.apache.org/jira/browse/SOLR-295 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Reporter: Pieter Berkel Priority: Minor There's nothing too clever about this initial patch to be upload shortly, I have simply extracted the MLT code from the StandardRequestHandler and inserted it into the DismaxRequestHandler. However, there are some broader MLT issues that I'd also like to address in the near future: 1) (trivial) No This response format is experimental warning when MLT is used with StandardRequestHandler (or DismaxRequestHandler). Not really a big deal but at least makes developers aware of the possibility of future changes. 2) (trivial) org.apache.solr.common.util.MoreLikeThisParams should perhaps be moved to the more appropriate package org.apache.solr.common.params. 3) (non-trivial) The ability to specify the list of fields that should be returned when MLT is invoked from an external handler (i.e. StandardRequestHandler). Currently the field list (FL) parameter is inherited from the main query but I can envisage cases where it would be desirable to specify more or less return fields in the MLT query than the main query. One complication is that mlt.fl is already used to specify the fields used for similarity. Perhaps mlt.fl is not the best name for this parameter and should be renamed to avoid potential conflict / confusion? 4) (fairly-trivial) On a similar note to 3, there is currently no way to specify a start value for the rows returned when MLT is invoked from an external handler (e.g. StandardRequestHandler), it is hard-coded to 0 (i.e. the first mlt.count documents matched). While I can see the logic in naming the parameter mlt.count, it does seem a little inconsistent and perhaps it would be better to rename (or at least alias) it to mlt.rows to be consistent with the CommonQueryParameters. Note that mlt.start is fundamentally different to the mlt.match.offset parameter as the later deals with documents *matching* the initial MLT query while the former deals with documents *returned* by the MLT query (hope that makes sense). I have created a patch that implemented mlt.start (to specify the start doc) and added mlt.rows that could be used interchangeably with mlt.count (but I would prefer to remove mlt.count altogether), but since it involves changing the method definition of MoreLikeThisHelper.getMoreLikeThese(), I wanted to get some opinions before submitting it. 5) (non-trivial) Interesting Terms - the ability to return interesting term information using the mlt.interestingTerms parameter when MLT is invoked from an external handler. This is perhaps the most useful feature I am looking to implement, I can see great benefit in being able to provide a list of interesting terms or keywords for each document returned in a standard or dismax query. Currently this only available from the MLT request handler so perhaps the best approach would be to re-factor the interestingTerms code in MoreLikeThisHandler class and put it somewhere in MoreLikeThisHelper so it is available to all handlers? Again, I would appreciate any comments or suggestions. I've also noted the MLT features suggested by Tristan [ http://www.nabble.com/MoreLikeThis-with-DisMax-boost-query---functions-tf4047187.html ] which could quite possibly be rolled together with the above points -- I'm not sure whether is is better to have a single ticket tracking several related issues or create invididual tickets for each issue, however will be happy to comply with the Solr issue tracking policy on advice from the core developers. regards, Pieter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.