Re: MorphlineSolrSink
Rajesh, I think this question is better suited for the FLUME user mailing list. You will need to configure the sink with the expected values so that the events from the channels can head to the right place. On Mon, Jul 15, 2013 at 4:49 PM, Rajesh Jain rjai...@gmail.com wrote: Newbie question: I have a Flume server, where I am writing to sink which is a RollingFile Sink. I have to take this files from the sink and send it to Solr which can index and provide search. Do I need to configure MorphineSolrSink? What is the mechanism's to do this or send this data over to Solr. Thanks, Rajesh -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released
I am working on that, I hope to have an answer within a month or so. On Tue, Jun 21, 2011 at 9:51 AM, roySolr royrutten1...@gmail.com wrote: Are you working on some changes to support earlier versions of PHP? -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCEMENT-PHP-Solr-Extension-1-0-1-Stable-Has-Been-Released-tp3024040p3090702.html Sent from the Solr - User mailing list archive at Nabble.com. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released
it looks like you have to upgrade to php 5.3.x Unfortunately, it looks like that method signature was different in that version of PHP. I would have to make additional changes to support the earlier versions of PHP On Tue, Jun 7, 2011 at 9:05 AM, roySolr royrutten1...@gmail.com wrote: Hello, I have some problems with the installation of the new PECL package solr-1.0.1. I run this command: pecl uninstall solr-beta ( to uninstall old version, 0.9.11) pecl install solr The installing is running but then it gives the following error message: /tmp/tmpKUExET/solr-1.0.1/solr_functions_helpers.c: In function 'solr_json_to_php_native': /tmp/tmpKUExET/solr-1.0.1/solr_functions_helpers.c:1123: error: too many arguments to function 'php_json_decode' make: *** [solr_functions_helpers.lo] Error 1 ERROR: `make' failed I have php version 5.2.17. How can i fix this? -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCEMENT-PHP-Solr-Extension-1-0-1-Stable-Has-Been-Released-tp3024040p3034350.html Sent from the Solr - User mailing list archive at Nabble.com. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
[ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released
The new PECL package solr-1.0.1 (stable) has been released at http://pecl.php.net/. Release notes - - Added support for json response writer in SolrClient - Removed final bit from classes so that they can be mocked in unit tests - Changed from beta to stable - Included phpdoc stubs in source to enable autocomplete of Solr classes and methods in IDE during development - Lowered libxml2 version requirement to 2.6.16 Package Info - It effectively simplifies the process of interacting with Apache Solr using PHP5 and it already comes with built-in readiness for the latest features added in Solr 3.1. The extension has features such as built-in, serializable query string builder objects which effectively simplifies the manipulation of name-value pair request parameters across repeated requests. The response from the Solr server is also automatically parsed into native php objects whose properties can be accessed as array keys or object properties without any additional configuration on the client-side. Its advanced HTTP client reuses the same connection across multiple requests and provides built-in support for connecting to Solr servers secured behind HTTP Authentication or HTTP proxy servers. It is also able to connect to SSL-enabled containers. Please consult the documentation for more details on features. Included in the source code are phpdoc stubs that enable autocomplete of Solr classes and methods in IDE during development in userland. Related Links - Package home: http://pecl.php.net/package/solr Changelog: http://pecl.php.net/package-changelog.php?package=solr Download: http://pecl.php.net/get/solr-1.0.1.tgz Authors - Israel Ekpo ie...@php.net (lead) -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: PECL SOLR PHP extension, JSON output
There are instructions here for Solr 1.4 https://issues.apache.org/jira/browse/SOLR-1967 I have not finished the version of the plugin that will allow you to use phpnative in 3.1 yet I will post them as soon as I can I have not been working on the PECL extension for a while now but I am planning to modify the source to include support for JSON response writer soon. Stay tuned. On Thu, Apr 21, 2011 at 9:47 AM, Ralf Kraus r...@pixelhouse.de wrote: Am 21.04.2011 13:58, schrieb roySolr: I have tried that but it seems like JSON is not supported Parameters responseWriter One of the following : - xml - phpnative -- View this message in context: http://lucene.472066.n3.nabble.com/PECL-SOLR-PHP-extension-JSON-output-tp2846092p2846728.html Sent from the Solr - User mailing list archive at Nabble.com. And I can´t get phpnative working with SOLR 3.1 :-( -- Greets, Ralf Kraus -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: phpnative response writer in SOLR 3.1 ?
Sorry for the late response. I am working on an updated version for the latest release of Solr and Lucene I will post my changes soon within the week. Thank you for your patience. On Fri, Apr 15, 2011 at 3:20 AM, Ralf Kraus r...@pixelhouse.de wrote: Am 14.04.2011 09:53, schrieb Ralf Kraus: Hello, I just updatet to SOLR 3.1 and wondering if the phpnative response writer plugin is part of it? ( https://issues.apache.org/jira/browse/SOLR-1967 ) When I try to compile the sources files I get some errors : PHPNativeResponseWriter.java:57: org.apache.solr.request.PHPNativeResponseWriter is not abstract and does not override abstract method getContentType(org.apache.solr.request.SolrQueryRequest,org.apache.solr.response.SolrQueryResponse) in org.apache.solr.response.QueryResponseWriter public class PHPNativeResponseWriter implements QueryResponseWriter { ^ PHPNativeResponseWriter.java:70: method does not override a method from its superclass @Override ^ Is there a new JAR File or something I could use with SOLR 3.1? Because the SOLR pecl Package only uses XML oder PHPNATIVE as response writer ( http://pecl.php.net/package/solr ) No hints at all ? -- Greetings, Ralf Kraus -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Solr Php Client
Cool. I will take a look at the issue later tomorrow. On Fri, Apr 8, 2011 at 2:28 AM, Haspadar haspa...@gmail.com wrote: I'm entering only a query parameter. I posted a bug description there - http://pecl.php.net/bugs/bug.php?id=22634 2011/4/8 Israel Ekpo israele...@gmail.com Hi, Could you send the enter list of parameters you are ending to solr via the SolrClient and SolrQuery object? Please open a bug request here with the details http://pecl.php.net/bugs/report.php?package=solr On Thu, Apr 7, 2011 at 7:59 PM, Haspadar haspa...@gmail.com wrote: Hello I updated Solr to version 3.1 on my project. And now when the application calls getResponse () method (PECL extension) I get the following: Fatal error: Uncaught exception 'SolrException' with message 'Error un-serializing response' in /home/.../Adapter/Solr.php: 78 How can I fix it? Thanks -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Solr Php Client
Hi, Could you send the enter list of parameters you are ending to solr via the SolrClient and SolrQuery object? Please open a bug request here with the details http://pecl.php.net/bugs/report.php?package=solr On Thu, Apr 7, 2011 at 7:59 PM, Haspadar haspa...@gmail.com wrote: Hello I updated Solr to version 3.1 on my project. And now when the application calls getResponse () method (PECL extension) I get the following: Fatal error: Uncaught exception 'SolrException' with message 'Error un-serializing response' in /home/.../Adapter/Solr.php: 78 How can I fix it? Thanks -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: New PHP API for Solr (Logic Solr API)
Lukas, How do you think it should have been designed? Most libraries are not going to have all the features that you need and while there may be features about the library that you do not like others may really appreciate them being there. As I said earlier in an earlier email a couple of months ago, the SolrQuery:set(), get() and add() methods do exist for you to use if you prefer not to use the feature specific methods in the SolrQuery class, thats the beauty of it. The PECL extension was something I designed to use on a personal project and it was really helpful in managing faceted search and other features that solr has to offer. I decided to share it with the PHP community because I felt others might need similar functionality. So it is possible that they may have been use cases that applied to my project that may not be applicable to yours I initially used the SolrJ API to access Solr via Java and then when I had a PHP project I decided to use something similar to SolrJ but at the time there was nothing similar in the PHP realm http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/package-summary.html A review of the SolrJ API will offer more explanations on some of the features present in the PECL API I will really love to get feedback from others about the design of the PECL library about any other missing or extraneous features Thanks. On Mon, Mar 7, 2011 at 4:04 AM, Lukas Kahwe Smith m...@pooteeweet.orgwrote: On 07.03.2011, at 09:43, Stefan Matheis wrote: Burak, what's wrong with the existing PHP-Extension (http://php.net/manual/en/book.solr.php)? the main issue i see with it is that the API isn't designed much. aka it just exposes lots of features with dedicated methods, but doesnt focus on keeping the API easy to overview (aka keep simple things simple and make complex stuff possible). at the same time fundamental stuff like quoting are not covered. that being said, i do not think we really need a proliferation of solr API's for PHP, even if this one is based on PHP 5.3 (namespaces etc). btw there is already another PHP 5.3 based API, though it tries to also unify other Lucene based API's as much as possible: https://github.com/dstendardi/Ariadne regards, Lukas Kahwe Smith m...@pooteeweet.org -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: solr init.d script
I think it would be a better idea to load solr via a servlet container like Tomcat and then create the init.d script for tomcat instead. http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6 On Tue, Nov 9, 2010 at 2:47 AM, Eric Martin e...@makethembite.com wrote: Er, what flavor? RHEL / CentOS #!/bin/sh # Starts, stops, and restarts Apache Solr. # # chkconfig: 35 92 08 # description: Starts and stops Apache Solr SOLR_DIR=/var/solr JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar LOG_FILE=/var/log/solr.log JAVA=/usr/bin/java case $1 in start) echo Starting Solr cd $SOLR_DIR $JAVA $JAVA_OPTIONS 2 $LOG_FILE ;; stop) echo Stopping Solr cd $SOLR_DIR $JAVA $JAVA_OPTIONS --stop ;; restart) $0 stop sleep 1 $0 start ;; *) echo Usage: $0 {start|stop|restart} 2 exit 1 ;; esac Debian http://xdeb.org/node/1213 __ Ubuntu STEPS Type in the following command in TERMINAL to install nano text editor. sudo apt-get install nano Type in the following command in TERMINAL to add a new script. sudo nano /etc/init.d/solr TERMINAL will display a new page title GNU nano 2.0.x. Paste the below script in this TERMINAL window. #!/bin/sh -e # Starts, stops, and restarts solr SOLR_DIR=/apache-solr-1.4.0/example JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=stopkey -jar start.jar LOG_FILE=/var/log/solr.log JAVA=/usr/bin/java case $1 in start) echo Starting Solr cd $SOLR_DIR $JAVA $JAVA_OPTIONS 2 $LOG_FILE ;; stop) echo Stopping Solr cd $SOLR_DIR $JAVA $JAVA_OPTIONS --stop ;; restart) $0 stop sleep 1 $0 start ;; *) echo Usage: $0 {start|stop|restart} 2 exit 1 ;; esac Note: In above script you might have to replace /apache-solr-1.4.0/example with appropriate directory name. Press CTRL-X keys. Type in Y When ask File Name to Write press ENTER key. You're now back to TERMINAL command line. Type in the following command in TERMINAL to create all the links to the script. sudo update-rc.d solr defaults Type in the following command in TERMINAL to make the script executable. sudo chmod a+rx /etc/init.d/solr To test. Reboot your Ubuntu Server. Wait until Ubuntu Server reboot is completed. Wait 2 minutes for Apache Solr to startup. Using your internet browser go to your website and try a Solr search. -Original Message- From: Nikola Garafolic [mailto:nikola.garafo...@srce.hr] Sent: Monday, November 08, 2010 11:42 PM To: solr-user@lucene.apache.org Subject: solr init.d script Hi, Does anyone have some kind of init.d script for solr, that can start, stop and check solr status? -- Nikola Garafolic SRCE, Sveucilisni racunski centar tel: +385 1 6165 804 email: nikola.garafo...@srce.hr -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: solr init.d script
Yes. I recommend running Solr via a servlet container. It is much easier to manage compared to running it by itself. On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic nikola.garafo...@srce.hrwrote: I have two nodes running one jboss server each and using one (single) solr instance, thats how I run it for now. Do you recommend running jboss with solr via servlet? Two jboss run in load-balancing for high availability purpose. For now it seems to be ok. On 11/09/2010 03:17 PM, Israel Ekpo wrote: I think it would be a better idea to load solr via a servlet container like Tomcat and then create the init.d script for tomcat instead. http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6 -- Nikola Garafolic SRCE, Sveucilisni racunski centar tel: +385 1 6165 804 email: nikola.garafo...@srce.hr -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
ZendCon 2010 - Slides on Building Intelligent Search Applications with Apache Solr and PHP 5
Due to popular demand, the link to my slides @ ZendCon are now available here in case anyone else is looking for it. http://slidesha.re/bAXNF3 The sample code will be uploaded shortly. Feedback is also appreciated http://joind.in/2261 -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Implementing Search Suggestion on Solr
I think you may want to configure the field type used for the spell check to use the synonyms file/database. That way synonyms are also processed during index time. This could help. On Wed, Oct 27, 2010 at 6:47 AM, Antonio Calo' anton.c...@gmail.com wrote: Hi If I understood, you will build a kind of dictionary or ontology or thesauru and you will use it if Solr query results are few. At query time (before or after) you will perform a query on this dictionary in order to retrieve the suggested word. If you need to do this, you can try to cvreate a custom request handler where you can controll the querying process in a simple manner ( http://wiki.apache.org/solr/SolrRequestHandler). With the custom request handler, you can add custom code to check query results before submitting query to solr or analizing the query before sending result to client. I never coded one, but I think this is a good starting point. Hope this can help you Antonio Il 27/10/2010 11.03, Pablo Recio ha scritto: Thanks, it's not what I'm looking for. Actually I need something like search Ubuntu and it will prompt Maybe you will like 'Debian' too or something like that. I'm not trying to do it automatically, manually will be ok. Anyway, is good article you shared, maybe I will implement it, thanks! 2010/10/27 Jakub Godawajakub.god...@gmail.com I am a real rookie at solr, but try this: http://solr.pl/2010/10/18/solr-and-autocomplete-part-1/?lang=en 2010/10/27 Pablo Reciopre...@yaco.es Hi, I don't want to be annoying, but I'm looking for a way to do that. I repeat the question: is there a way to implement Search Suggestion manually? Thanks in advance. Regards, 2010/10/18 Pablo Recio Quijanopre...@yaco.es Hi! I'm trying to implement some kind of Search Suggestion on a search engine I have implemented. This search suggestions should not be automatically like the one described for the SpellCheckComponent [1]. I'm looking something like: SAS oppositions = Public job offers for some-company So I will have to define it manually. I was thinking about synonyms [2] but I don't know if it's the proper way to do it, because semantically those terms are not synonyms. Any ideas or suggestions? Regards, [1] http://wiki.apache.org/solr/SpellCheckComponent [2] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Highlighting for non-stored fields
Check out this link http://wiki.apache.org/solr/FieldOptionsByUseCase You need to store the field if you want to use the highlighting feature. If you need to retrieve and display the highlighted snippets then the fields definitely needs to be stored. To use term offsets, it will be a good idea to enable the following attributes for that field termVectors termPositions termOffsets The only issue here is that your storage costs will increase because of these extra features. Nevertheless, you definitely need to store the field if you need to retrieve it for highlighting purposes. On Tue, Oct 26, 2010 at 6:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I've been looking thru the mailing archive for the past week and I haven't found any useful info regarding this issue. My requirement is to index a few terabytes worth of data to be searched. Due to the size of the data, I would like to index without storing but I would like to use the highlighting feature. Is this even possible? What are my options? I've read about termOffsets, payload that could possibly be used to do this but I have no idea how this could be done. Any pointers greatly appreciated. Someone please point me in the right direction. I don't mind having to write some code or digging thru existing code to accomplish this task. Thanks, P. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Documents are deleted when Solr is restarted
The Solr home is the -Dsolr.solr.home Java System property Also make sure that -Dsolr.data.dir is define for your data directory, if it is not already defined in the solrconfig.xml file On Tue, Oct 26, 2010 at 10:46 AM, Upayavira u...@odoko.co.uk wrote: You need to watch what you are setting your solr.home to. That is where your indexes are being written. Are they getting overwritten/lost somehow. Watch the files in that dir while doing a restart. That's a start at least. Upayavira On Tue, 26 Oct 2010 16:40 +0300, Mackram Raydan mack...@gmail.com wrote: Hey everyone, I apologize if this question is rudimentary but it is getting to me and I did not find anything reasonable about it online. So basically I have a Solr 1.4.1 setup behind Tomcat 6. I used the SolrTomcat wiki page to setup. The system works exactly the way I want it (proper search, highlighting, etc...). The problem however is when I restart my Tomcat server all the data in Solr (ie the index) is simply lost. The admin shows me the number of docs is 0 when it was before in the thousands. Can someone please help me understand why the above is happening and how can I workaround it if possible? Big thanks for any help you can send my way. Regards, Mackram -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Modelling Access Control
On Mon, Oct 25, 2010 at 8:16 AM, Paul Carey paul.p.ca...@gmail.com wrote: Many thanks for all the responses. I now plan on benchmarking and validating both the filter query approach, and maintaining the ACL entirely outside of Solr. I'll decide from there. Paul Great. I am looking forward for some feedback on the benchmarks. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Modelling Access Control
Hi Paul, Regardless of how you implement it, I would recommend you use filter queries for the permissions check rather than making it part of the main query. On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com wrote: Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Modelling Access Control
Hi All, I think using filter queries will be a good option to consider because of the following reasons * The filter query does not affect the score of the items in the result set. If the ACL logic is part of the main query, it could influence the scores of the items in the result set. * Using a filter query could lead to better performance in complex queries because the results from the query specified with fq are cached independently from that of the main query. Since the result of a filter query is cached, it will be used to filter the primary query result using set intersection without having to fetch the ids of the documents from the fq again a second time. It think this will be useful because we could assume that the ACL portion in the fq is relatively constant since the permissions for each user is not something that is changing frequently. http://wiki.apache.org/solr/FilterQueryGuidance On Sat, Oct 23, 2010 at 2:58 PM, Dennis Gearon gear...@sbcglobal.netwrote: why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from ' http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Israel Ekpo israele...@gmail.com wrote: From: Israel Ekpo israele...@gmail.com Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 23, 2010, 7:01 AM Hi Paul, Regardless of how you implement it, I would recommend you use filter queries for the permissions check rather than making it part of the main query. On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com wrote: Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Removing Common Web Page Header and Footer from All Content Fetched by Nutch
Thanks Otis and Markus for your input. I will check it out today. On Tue, Oct 19, 2010 at 4:45 AM, Markus Jelsma markus.jel...@openindex.iowrote: Unfortunately, Nutch still uses Tika 0.7 in 1.2 and trunk. Nutch needs to be upgraded to Tika 0.8 (when it's released or just the current trunk). Also, the Boilerpipe API needs to be exposed through Nutch configuration, which extractor can be used, which parameters need to be set etc. Upgrading to Tika's trunk might be relatively easy but exposing Boilerpipe surely isn't. On Tuesday, October 19, 2010 06:47:43 am Otis Gospodnetic wrote: Hi Israel, You can use this: http://search-lucene.com/?q=boilerpipefc_project=Tika Not sure if it's built into Nutch, though... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Israel Ekpo israele...@gmail.com To: solr-user@lucene.apache.org; u...@nutch.apache.org Sent: Mon, October 18, 2010 9:01:50 PM Subject: Removing Common Web Page Header and Footer from All Content Fetched by Nutch Hi All, I am indexing a web application with approximately 9500 distinct URL and contents using Nutch and Solr. I use Nutch to fetch the urls, links and the crawl the entire web application to extract all the content for all pages. Then I run the solrindex command to send the content to Solr. The problem that I have now is that the first 1000 or so characters of some pages and the last 400 characters of the pages are showing up in the search results. These are contents of the common header and footer used in the site respectively. The only work around that I have now is to index everything and then go through each document one at a time to remove the first 1000 characters if the levenshtein distance between the first 1000 characters of the page and the common header is less than a certain value. Same applies to the footer content common to all pages. Is there a way to ignore certain stop phrase so to speak in the Nutch configuration based on levenshtein distance or jaro winkler distance so that certain parts of the fetched data that matches this stop phrases will not be parsed? Any useful pointers would be highly appreciated. Thanks in advance. -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350 -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?
Hi All, Just wanted to post an update on where we stand with all the requests for new features List of Features Requested In SOLR PECL Extension 1. Ability to Send Custom Requests to Custom URLS other than select, update, terms etc. 2. Ability to add files (pdf, office documents etc) 3. Windows version of latest releases. 4. Ensuring that SolrQuery::getFields(), SolrQuery::getFacets() et al returns an array consistently. 5. Lowering Libxml version to 2.6.16 If there is anything that you think I left out please let me know. This is a summary. On Wed, Oct 13, 2010 at 3:48 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: On Tue, Oct 12, 2010 at 6:29 PM, Israel Ekpo israele...@gmail.com wrote: I think this feature will take care of this. What do you think? sounds good! -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Commits on service after shutdown
The documents should be implicitly committed when the Lucene index is closed. When you perform a graceful shutdown, the Lucene index gets closed and the documents get committed implicitly. When the shutdown is abrupt as in a KILL -9, then this does not happen and the updates are lost. You can use the auto commit parameter when sending your updates so that the changes are saved right away, thought this could slow down the indexing speed considerably but I do not believe there are parameters to keep those un-commited documents alive after a kill. On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderara ezech...@gmail.comwrote: Hi, i'm new in the mailing list. I'm implementing Solr in my actual job, and i'm having some problems. I was testing the consistency of the commits. I found for example that if we add X documents to the index (without commiting) and then we restart the service, the documents are commited. They show up in the results. This is interpreted to me like an error. But when we add X documents to the index (without commiting) and then we kill the process and we start it again, the documents doesn't appear. This behaviour is the one i want. Is there any param to avoid the auto-committing of documents after a shutdown? Is there any param to keep those un-commited documents alive after a kill? Thanks! -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Removing Common Web Page Header and Footer from All Content Fetched by Nutch
Hi All, I am indexing a web application with approximately 9500 distinct URL and contents using Nutch and Solr. I use Nutch to fetch the urls, links and the crawl the entire web application to extract all the content for all pages. Then I run the solrindex command to send the content to Solr. The problem that I have now is that the first 1000 or so characters of some pages and the last 400 characters of the pages are showing up in the search results. These are contents of the common header and footer used in the site respectively. The only work around that I have now is to index everything and then go through each document one at a time to remove the first 1000 characters if the levenshtein distance between the first 1000 characters of the page and the common header is less than a certain value. Same applies to the footer content common to all pages. Is there a way to ignore certain stop phrase so to speak in the Nutch configuration based on levenshtein distance or jaro winkler distance so that certain parts of the fetched data that matches this stop phrases will not be parsed? Any useful pointers would be highly appreciated. Thanks in advance. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Setting solr home directory in websphere
You need to make sure that the following system variable is one of the values specific in the JAVA_OPTS environment variable -Dsolr.solr.home=path_to_solr_home On Mon, Oct 18, 2010 at 10:20 PM, Kevin Cunningham kcunning...@telligent.com wrote: I've installed Solr a hundred times using Tomcat (on Windows) but now need to get it going with WebSphere (on Windows). For whatever reason this seems to be black magic :) I've installed the war file but have no idea how to set Solr home to let WebSphere know where the index and config files are. Can someone enlighten me on how to do this please? -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Term is duplicated when updating a document
Which fields are modified when the document is updated/replaced. Are there any differences in the content of the fields that you are using for the AutoSuggest. Have you changed you schema.xml file recently? If you have, then there may have been changes in the way these fields are analyzed and broken down to terms. This may be a bug if you did not change the field or the schema file but the terms count is changing. On Fri, Oct 15, 2010 at 9:14 AM, Thomas Kellerer spam_ea...@gmx.net wrote: Hi, we are updating our documents (that represent products in our shop) when a dealer modifies them, by calling SolrServer.add(SolrInputDocument) with the updated document. My understanding is, that there is no other way of updating an existing document. However we also use a term query to autocomplete the search field for the user, but each time adocument is updated (added) the term count is incremented. So after starting with a new index the count is e.g. 1, then the document (that contains that term) is updated, and the count is 2, the next update will set this to 3 and so on. One the index is optimized (by calling SolServer.optimize()) the count is correct again. Am I missing something or is this a bug in Solr/Lucene? Thanks in advance Thomas -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?
On Mon, Oct 11, 2010 at 3:33 AM, Lukas Kahwe Smith m...@pooteeweet.orgwrote: On 11.10.2010, at 07:03, Israel Ekpo wrote: I am currently working on a couple of bug fixes for the Solr PECL extension that will be available in the next release 0.9.12 sometime this month. http://pecl.php.net/package/solr Documentation of the current API and features for the PECL extension is available here http://www.php.net/solr A couple of users in the community were asking when the PHP extension will be moving from beta to stable. The API looks stable so far with no serious issues and I am looking to moving it from *Beta* to *Stable *on November 20 2010 If you are using Solr via PHP and would like to see any new features in the extension please feel free to send me a note. I would like to incorporate those changes in 0.9.12 so that user can try them out and send me some feedback before the release of version 1.0 Thanks in advance for your response. we already had some emails about this. imho there are too many methods for specialized tasks, that its easy to get lost in the API, especially since not all of them have written documentation yet beyond the method signatures. also i do think that there should be methods for escaping and also tokenizing lucene queries to enable validation of the syntax used etc. see here for a use case and a user land implementation: http://pooteeweet.org/blog/1796 regards, Lukas Kahwe Smith m...@pooteeweet.org Thanks Lukas for your feed back. Could you clarify the part about too many methods for specialized task? From feedback that I have received so far, most users like the specialization and and a small fraction do not. So it might be a matter of preference. I decided to add the specialized methods in the SolrQuery class because at the time, that was what most of the users wanted to see in the API. They cannot be removed now. As per the documentation, all of the methods are documented with at least a brief heading or summary of what it is supposed to do. http://php.net/solr The user needs to understand first which query parameters they need to send to Solr and then they can use one of the SolrQuery methods for that purpose. Additional information is available from Solr Tutorials and the wiki itself. If one choses not to use a specialized method there is always the get(), set() and add() methods that allows you to pass the parameter values directly instead of using a specialized method for that parameter. For escaping queries, we already have the following method SolrUtils::escapeQueryChars http://www.php.net/manual/en/solrutils.escapequerychars.php http://www.php.net/manual/en/class.solrutils.php As per the tokenization, it is not clear exactly what you were referring to. I think it is best for the analysis of any of the tokens to be handled at the server layer. There are tools in the admin interface for analyzing and breaking down the query components into tokens. I also took a look at your blog but I could not immediately the use case you were referring to. A little more detail on this will be helpful. Thanks Lukas for your input. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?
On Tue, Oct 12, 2010 at 7:42 AM, Peter Blokland pe...@desk.nl wrote: hi, On Mon, Oct 11, 2010 at 01:03:07AM -0400, Israel Ekpo wrote: If you are using Solr via PHP and would like to see any new features in the extension please feel free to send me a note. I'm currently testing a setup with Solr via PHP, and was wondering if support for the ExtractingRequestHandler is planned ? It may be that I missed something in the documentation, but for now it looks like I need to build my own POST's to the /solr/update/extract handler. -- CUL8R, Peter. www.desk.nl --- Sent from my NetBSD-powered Talkie Toaster™ Peter, That is an excellent idea. I will add that to the wishlist. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?
On Tue, Oct 12, 2010 at 8:43 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Hi Isreal, On Mon, Oct 11, 2010 at 7:03 AM, Israel Ekpo israele...@gmail.com wrote: If you are using Solr via PHP and would like to see any new features in the extension please feel free to send me a note. we actually tried to grab some informations from solr's dataimport-page, but therefore we had to generate the complete url manually. which means, we have to access the solr-object to get hostname, port, etc and construct the needed url ourself. perhaps it's an idea to implement something like $solr-executeHttpRequest( 'GET', 'dataimport', array( 'command' = 'status' ) which could easily reuse all given informations and also for example the existing proxy handling. Regards Stefan Stefan, I agree with you. Excellent idea. I am currently working on a feature that will allow you to specify the target path (url) and then able to send any parameters or xml request to the server. I think this feature will take care of this. What do you think? -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?
On Tue, Oct 12, 2010 at 12:44 PM, Ken Stanley doh...@gmail.com wrote: If you are using Solr via PHP and would like to see any new features in the extension please feel free to send me a note. I'm new to this list, but in seeing this thread - and using PHP SOLR - I wanted to make a suggestion that - while minor - I think would greatly improve the quality of the extension. (I'm basing this mostly off of SolrQuery since that's where I've encountered the issue, but this might be true elsewhere) Whenever a method is supposed to return an array (i.e., SolrQuery::getFields(), SolrQuery::getFacets(), etc), if there is no data to return, a null is returned. I think that this should be normalized across the board to return an empty array. First, the documentation is contradictory (http://us.php.net/manual/en/solrquery.getfields.php) in that the method signature says that it returns an array (not mixed), while the Return Values section says that it returns either an array or null. Secondly, returning an array under any circumstance provides more consistency and less logic; for example, let's say that I am looking for the fields (as-is in its current state): ?php // .. assume a proper set up if ($solrquery-getFields() !== null) { foreach ($solrquery-getFields() as $field) { // Do something } } ? This is a minor request, I know. But, I feel that it would go a long way toward polishing the extension up for general consumption. Thank you, Ken Stanley PS. I apologize if this request has come through the pipes already; as I've stated, I am new to this list; I have yet to find any reference to my request. :) Great recommendation Ken. Thanks for catching that! That should be a quick one. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Solr PHP PECL Extension going to Stable Release - Wishing for Any New Features?
Hi All, I am currently working on a couple of bug fixes for the Solr PECL extension that will be available in the next release 0.9.12 sometime this month. http://pecl.php.net/package/solr Documentation of the current API and features for the PECL extension is available here http://www.php.net/solr A couple of users in the community were asking when the PHP extension will be moving from beta to stable. The API looks stable so far with no serious issues and I am looking to moving it from *Beta* to *Stable *on November 20 2010 If you are using Solr via PHP and would like to see any new features in the extension please feel free to send me a note. I would like to incorporate those changes in 0.9.12 so that user can try them out and send me some feedback before the release of version 1.0 Thanks in advance for your response. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: [PECL-DEV] Re: PHP Solr API
Scott, You can also use the SolrClient::setServlet() method with SolrClient::TERMS_SERVLET_TYPE as the type http://www.php.net/manual/en/solrclient.setservlet.php On Fri, Oct 1, 2010 at 12:57 AM, Scott Yeadon scott.yea...@anu.edu.auwrote: Hi, Sorry, scrap that, just found that SolrQuery is a subclass of ModifiableParams so can do this via add method and seems to work ok. Apologies for the noise. Scott. On 1/10/10 2:35 PM, Scott Yeadon wrote: Hi, Just wondering if there is a way of setting the qt parameter in the Solr PHP API. I want to use the Term Vector Component but not sure this is supported in the API? Thanks Scott. -- PECL development discussion Mailing List (http://pecl.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Null Pointer Exception while indexing
Try removing the data directory and then restart your Servlet container and see if that helps. On Thu, Sep 16, 2010 at 3:28 AM, Lance Norskog goks...@gmail.com wrote: Which version of Solr? 1.4?, 1.4.1? 3.x branch? trunk? if the 3.x or the trunk, when did you pull it? andrewdps wrote: What could be possible error for 14-Sep-10 4:28:47 PM org.apache.solr.common.SolrException log SEVERE: java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(libgcj.so.90) at java.util.concurrent.FutureTask.get(libgcj.so.90) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:439) at org.apache.solr.update.DirectUpdateHandler2$CommitTracker.run(DirectUpdateHandler2.java:602) at java.util.concurrent.Executors$RunnableAdapter.call(libgcj.so.90) at java.util.concurrent.FutureTask$Sync.innerRun(libgcj.so.90) at java.util.concurrent.FutureTask.run(libgcj.so.90) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$2(libgcj.so.90) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(libgcj.so.90) at java.util.concurrent.ThreadPoolExecutor$Worker.run(libgcj.so.90) at java.lang.Thread.run(libgcj.so.90) Caused by: java.lang.NullPointerException at org.apache.solr.search.FastLRUCache.getStatistics(FastLRUCache.java:252) at org.apache.solr.search.FastLRUCache.toString(FastLRUCache.java:280) at java.lang.StringBuilder.append(libgcj.so.90) at org.apache.solr.search.SolrIndexSearcher.close(SolrIndexSearcher.java:223) at org.apache.solr.core.SolrCore$6.close(SolrCore.java:1246) at org.apache.solr.util.RefCounted.decref(RefCounted.java:57) at org.apache.solr.core.SolrCore$5.call(SolrCore.java:1192) at java.util.concurrent.FutureTask$Sync.innerRun(libgcj.so.90) at java.util.concurrent.FutureTask.run(libgcj.so.90) ...3 more I get this error(after indexing a few records I get the above error and again starts indexing.i get the same error after indexing few hundred records) when I try to index the marc record on the server.I worked fine on the local system. Thanks -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: ANNOUNCE: Stump Hoss @ Lucene Revolution
Chris, I have a couple of questions I would like to through your way. Is there a place where one can sign up for this. Is sounds very interesting. On Mon, Aug 23, 2010 at 4:49 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: Hey everybody, As you (hopefully) have heard by now, Lucid Imagination is sponsoring a Lucene/Solr conference in Boston about 6 weeks from now. We've got a lot of really great speakers lined up to give some really interesting technical talks, so I offered to do something a little bit different. I'm going to be in the hot seat for a Stump The Chump style session, where I'll be answering Solr questions live and unrehearsed... http://bit.ly/stump-hoss The goal is to really make me sweat and work hard to think of creative solutions to non-trivial problems on the spot -- like when I answer questions on the solr-user mailing list, except in a crowded room with hundreds of people staring at me and laughing. But in order to be a success, we need your questions/problems/challenges! If you had a tough situation with Solr that you managed to solve with a creative solution (or haven't solved yet) and are interesting to see what type of solution I might come up with under pressure, please email a description of your problem to st...@lucenerevolution.org -- More details online... http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter Even if you won't be able to make it to Boston, please send in any challenging problems you would be interested to see me tackle under the gun. The session will be recorded, and the video will be posted online shortly after the conference has ended. And if you can make it to Boston: all the more fun to watch live and in person (and maybe answer follow up questions) In any case, it should be a very interesting session: folks will either get to learn a lot, or laugh at me a lot, or both. (win/win/win) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump! -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Disk usage per-field
Currently, this feature is not available. The amount of space a field consumes varies and depends on whether the field is index only, stored only or indexed and stored. It also depends on how the field is analyzed On Fri, Jul 2, 2010 at 2:59 PM, Shawn Heisey s...@elyograg.org wrote: On 6/30/2010 5:44 PM, Shawn Heisey wrote: Is it possible for Solr (or Luke/Lucene) to tell me exactly how much of the total index disk space is used by each field? It would also be very nice to know, for each field, how much is used by the index and how much is used for stored data. Still interested in this. It would be perfectly OK if such a thing were completely external to Solr and required a good chunk of time to calculate. I would not need to do it very often. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
[NEWS] New Response Writer for Native PHP Solr Client
Hi Solr users, If you are using Apache Solr via PHP, I have some good news for you. There is a new response writer for the PHP native extension, currently available as a plugin. This new feature adds a new response writer class to the org.apache.solr.request package. This class is used by the PHP Native Solr Client driver to prepare the query response from Solr. This response writer allows you to configure the way the data is serialized for the PHP client. You can use your own class name and you can also control how the properties are serialized as well. The formatting of the response data is very similar to the way it is currently done by the PECL extension on the client side. The only difference now is that this serialization is happening on the server side instead. You will find this new response writer particularly useful when dealing with responses for - highlighting - admin threads responses - more like this responses to mention just a few You can pass the objectClassName request parameter to specify the class name to be used for serializing objects. Please note that the class must be available on the client side to avoid a PHP_Incomplete_Object error during the unserialization process. You can also pass in the objectPropertiesStorageMode request parameter with either a 0 (independent properties) or a 1 (combined properties). These parameters can also be passed as a named list when loading the response writer in the solrconfig.xml file Having this control allows you to create custom objects which gives the flexibility of implementing custom __get methods, ArrayAccess, Traversable and Iterator interfaces on the PHP client side. Until this class in incorporated into Solr, you simply have to copy the jar file containing this plugin into your lib directory under $SOLR_HOME The jar file is available here https://issues.apache.org/jira/browse/SOLR-1967 Then set up the configuration as shown below and then restart your servlet container Below is an example configuration in solrconfig.xml code queryResponseWriter name=phpnative class=org.apache.solr.request.PHPNativeResponseWriter !-- You can choose a different class for your objects. Just make sure the class is available in the client -- str name=objectClassNameSolrObject/str !-- 0 means OBJECT_PROPERTIES_STORAGE_MODE_INDEPENDENT 1 means OBJECT_PROPERTIES_STORAGE_MODE_COMBINED In independed mode, each property is a separate property In combined mode, all the properites are merged into a _properties array. The combined mode allows you to create custom __getters and you could also implement ArrayAccess, Iterator and Traversable -- int name=objectPropertiesStorageMode0/int /queryResponseWriter code Below is an example implementation on the PHP client side. Support for specifying custom response writers will be available starting from the 0.9.11 version (released today) of the PECL extension for Solr currently available here http://pecl.php.net/package/solr Here is an example of how to use the new response writer with the PHP client. code ?php class SolrClass { public $_properties = array(); public function __get($property_name) { if (property_exists($this, $property_name)) { return $this-$property_name; } else if (isset($_properties[$property_name])) { return $_properties[$property_name]; } return null; } } $options = array ( 'hostname' = 'localhost', 'port' = 8983, 'path' = '/solr/' ); $client = new SolrClient($options); $client-setResponseWriter(phpnative); $response = $client-ping(); $query = new SolrQuery(); $query-setQuery(:); $query-set(objectClassName, SolrClass); $query-set(objectPropertiesStorageMode, 1); $response = $client-query($query); $resp = $response-getResponse(); ? code Documentation of the changes to the PECL extension are available here http://docs.php.net/manual/en/solrclient.construct.php http://docs.php.net/manual/en/solrclient.setresponsewriter.php Please contact me at ie...@php.net, if you have any questions or comments. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
[PECL-DEV] [ANNOUNCEMENT] solr-0.9.11 (beta) Released
The new PECL package solr-0.9.11 (beta) has been released at http://pecl.php.net/. Release notes - - Added ability to specify response writer in constructor option (wt) - Added new method to set response writer SolrClient::setResponseWriter() - Currently, the only supported response writers are 'xml' and 'phpnative' - Added support for new native Solr response writer - New response writer is available at https://issues.apache.org/jira/browse/SOLR-1967 Package Info - It effectively simplifies the process of interacting with Apache Solr using PHP5 and it already comes with built-in readiness for the latest features added in Solr 1.4. The extension has features such as built-in, serializable query string builder objects which effectively simplifies the manipulation of name-value pair request parameters across repeated requests. The response from the Solr server is also automatically parsed into native php objects whose properties can be accessed as array keys or object properties without any additional configuration on the client-side. Its advanced HTTP client reuses the same connection across multiple requests and provides built-in support for connecting to Solr servers secured behind HTTP Authentication or HTTP proxy servers. It is also able to connect to SSL-enabled containers. Please consult the documentation for more details on features. Related Links - Package home: http://pecl.php.net/package/solr Changelog: http://pecl.php.net/package-changelog.php?package=solr Download: http://pecl.php.net/get/solr-0.9.11.tgz Documentation http://docs.php.net/ Authors - Israel Ekpo ie...@php.net (lead) -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: SOLR search performance - Linux vs Windows servers
Thats a good note. I get this kind of question a lot. Most of the time, the reason is because there are database servers (MySQL) and Webservers (Apache) and other processes running on the Linux box. Try to verify that the load, number of processors/cores as well as other environment settings are similar before drawing a conclusion. On Wed, Jun 16, 2010 at 5:43 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: BB, Could it be that you are comparing apples and oranges? * Is the hardware identical? * Are indices identical? * Are JVM versions the same? * Are JVM arguments identical? * Are the two boxes equally idle when Solr is not running? * etc. In general, no, there is no reason why Windows would automatically be faster than Linux. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: bbarani bbar...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, June 16, 2010 5:06:55 PM Subject: SOLR search performance - Linux vs Windows servers Hi, I have SOLR instances running in both Linux / windows server (same version / same index data). Search performance is good in windows box compared to Linux box. Some queries takes more than 10 seconds in Linux box but takes just a second in windows box. Have anyone encountered this kind of issue before? Thanks, BB -- View this message in context: href= http://lucene.472066.n3.nabble.com/SOLR-search-performance-Linux-vs-Windows-servers-tp901069p901069.html target=_blank http://lucene.472066.n3.nabble.com/SOLR-search-performance-Linux-vs-Windows-servers-tp901069p901069.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Need help with document format
I think you need a 1:1 mapping between the consultant and the company, else how are you going to run your queries for let's say consultants that worked for Google or AOL between March 1999 and August 2004? If the mapping is 1:1, your life would be easier and you would not need to do extra parsing of the results your retrieved. Unfortunately, it looks like your are doing to have a lot of records. With an RDBMS, it is easier to do joins but with Lucene and Solr you have to denormalize all the relationships. Hence in this particular scenario, if you have 5 consultants that worked for 4 distinct companies you will have to send 20 documents to Solr On Mon, Jun 7, 2010 at 10:15 AM, Moazzam Khan moazz...@gmail.com wrote: Thanks for the replies guys. I am currently storing consultants like this .. doc id123/id FirstNametony/FirstName LastNamemarjo/LastName CompanyGoogle/Company CompanyAOL/Company doc I have a few multi valued fields so if I do it the way Israel suggested it, I will have tons of records. Do you think it will be better if I did this instead ? doc id123/id FirstNametony/FirstName LastNamemarjo/LastName CompanyGoogle_StartDate_EndDate/Company CompanyAOL_StartDate_EndDate/Company doc Or is what you guys said better? Thanks for all the help. Moazzam On Mon, Jun 7, 2010 at 1:10 AM, Lance Norskog goks...@gmail.com wrote: And for 'present', you would pick some time far in the future: 2100-01-01T00:00:00Z On 6/5/10, Israel Ekpo israele...@gmail.com wrote: You need to make each document added to the index a 1 to 1 mapping for each company and consultant combo schema fields !-- Concatenation of company and consultant id -- field name=consultant_id_company_id type=string indexed=true stored=true required=true/ field name=consultant_firstname type=string indexed=true stored=true multiValued=false/ field name=consultant_lastname type=string indexed=true stored=true multiValued=false/ !-- The name of the company the consultant worked for -- field name=company type=text indexed=true stored=true multiValued=false/ field name=start_date type=tdate indexed=true stored=true multiValued=false/ field name=end_date type=tdate indexed=true stored=true multiValued=false/ /fields defaultSearchFieldtext/defaultSearchField copyField source=consultant_firstname dest=text/ copyField source=consultant_lastname dest=text/ copyField source=company dest=text/ /schema !-- So for instance, you have 2 consultants Michael Davis and Tom Anderson who worked for AOL and Microsoft, Yahoo, Google and Facebook. Michael Davis = 1 Tom Anderson = 2 AOL = 1 Microsoft = 2 Yahoo = 3 Google = 4 Facebook = 5 This is how you would add the documents to the index -- doc consultant_id_company_id1_1/consultant_id_company_id consultant_firstnameMichael/consultant_firstname consultant_lastnameDavis/consultant_lastname companyAOL/company start_date2006-02-13T15:26:37Z/start_date end_date2008-02-13T15:26:37Z/end_date /doc doc consultant_id_company_id1_4/consultant_id_company_id consultant_firstnameMichael/consultant_firstname consultant_lastnameDavis/consultant_lastname companyGoogle/company start_date2006-02-13T15:26:37Z/start_date end_date2009-02-13T15:26:37Z/end_date /doc doc consultant_id_company_id2_3/consultant_id_company_id consultant_firstnameTom/consultant_firstname consultant_lastnameAnderson/consultant_lastname companyYahoo/company start_date2001-01-13T15:26:37Z/start_date end_date2009-02-13T15:26:37Z/end_date /doc doc consultant_id_company_id2_4/consultant_id_company_id consultant_firstnameTom/consultant_firstname consultant_lastnameAnderson/consultant_lastname companyGoogle/company start_date1999-02-13T15:26:37Z/start_date end_date2010-02-13T15:26:37Z/end_date /doc The you can search as q=company:X AND start_date:[X TO *] AND end_date:[* TO Z] On Fri, Jun 4, 2010 at 4:58 PM, Moazzam Khan moazz...@gmail.com wrote: Hi guys, I have a list of consultants and the users (people who work for the company) are supposed to be able to search for consultants based on the time frame they worked for, for a company. For example, I should be able to search for all consultants who worked for Bear Stearns in the month of july. What is the best of accomplishing this? I was thinking of formatting the document like this company name Bear Stearns/name startDate2000-01-01/startDate endDatepresent/endDate /company company name AIG/name startDate1999-01-01/startDate endDate2000-01-01/endDate /company Is this possible? Thanks, Moazzam -- Good Enough is not good enough. To give anything less than your best is to sacrifice
Re: Need help with document format
You need to make each document added to the index a 1 to 1 mapping for each company and consultant combo schema fields !-- Concatenation of company and consultant id -- field name=consultant_id_company_id type=string indexed=true stored=true required=true/ field name=consultant_firstname type=string indexed=true stored=true multiValued=false/ field name=consultant_lastname type=string indexed=true stored=true multiValued=false/ !-- The name of the company the consultant worked for -- field name=company type=text indexed=true stored=true multiValued=false/ field name=start_date type=tdate indexed=true stored=true multiValued=false/ field name=end_date type=tdate indexed=true stored=true multiValued=false/ /fields defaultSearchFieldtext/defaultSearchField copyField source=consultant_firstname dest=text/ copyField source=consultant_lastname dest=text/ copyField source=company dest=text/ /schema !-- So for instance, you have 2 consultants Michael Davis and Tom Anderson who worked for AOL and Microsoft, Yahoo, Google and Facebook. Michael Davis = 1 Tom Anderson = 2 AOL = 1 Microsoft = 2 Yahoo = 3 Google = 4 Facebook = 5 This is how you would add the documents to the index -- doc consultant_id_company_id1_1/consultant_id_company_id consultant_firstnameMichael/consultant_firstname consultant_lastnameDavis/consultant_lastname companyAOL/company start_date2006-02-13T15:26:37Z/start_date end_date2008-02-13T15:26:37Z/end_date /doc doc consultant_id_company_id1_4/consultant_id_company_id consultant_firstnameMichael/consultant_firstname consultant_lastnameDavis/consultant_lastname companyGoogle/company start_date2006-02-13T15:26:37Z/start_date end_date2009-02-13T15:26:37Z/end_date /doc doc consultant_id_company_id2_3/consultant_id_company_id consultant_firstnameTom/consultant_firstname consultant_lastnameAnderson/consultant_lastname companyYahoo/company start_date2001-01-13T15:26:37Z/start_date end_date2009-02-13T15:26:37Z/end_date /doc doc consultant_id_company_id2_4/consultant_id_company_id consultant_firstnameTom/consultant_firstname consultant_lastnameAnderson/consultant_lastname companyGoogle/company start_date1999-02-13T15:26:37Z/start_date end_date2010-02-13T15:26:37Z/end_date /doc The you can search as q=company:X AND start_date:[X TO *] AND end_date:[* TO Z] On Fri, Jun 4, 2010 at 4:58 PM, Moazzam Khan moazz...@gmail.com wrote: Hi guys, I have a list of consultants and the users (people who work for the company) are supposed to be able to search for consultants based on the time frame they worked for, for a company. For example, I should be able to search for all consultants who worked for Bear Stearns in the month of july. What is the best of accomplishing this? I was thinking of formatting the document like this company name Bear Stearns/name startDate2000-01-01/startDate endDatepresent/endDate /company company name AIG/name startDate1999-01-01/startDate endDate2000-01-01/endDate /company Is this possible? Thanks, Moazzam -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Solr spellchecker field
Dejan, How are you making the calls from PHP to Solr? I am curious to know why the documents could not be parsed On Fri, May 28, 2010 at 5:00 AM, Dejan Noveski dr.m...@gmail.com wrote: Thank you very much! On Fri, May 28, 2010 at 10:57 AM, Erik Hatcher erik.hatc...@gmail.com wrote: A field used to build a spellcheck index only needs to be indexed, not stored. But, your PHP issue could be alleviated anyway by simply customizing the fl parameter and excluding the large stored field. This is often desirable for large fields that are never needed fully in the UI, but used internally for highlighting. Erik On May 28, 2010, at 4:47 AM, Dejan Noveski wrote: Hi, Does the field that is used for spellchecker indexing need to be stored and/or indexed? These fields became fairly large in my index, and php wont parse/decode the documents returned. -- -- Dejan Noveski Web Developer dr.m...@gmail.com Twitter: http://twitter.com/dekomote | LinkedIn: http://mk.linkedin.com/in/dejannoveski -- -- Dejan Noveski Web Developer dr.m...@gmail.com Twitter: http://twitter.com/dekomote | LinkedIn: http://mk.linkedin.com/in/dejannoveski -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Bitwise Operations on Integer Fields in Lucene and Solr Index
Hello Lucene and Solr Community I have a custom org.apache.lucene.search.Filter that I would like to contribute to the Lucene and Solr projects. So I would need some direction as to how to create and ISSUE or submit a patch. It looks like there have been changes to the way this is done since the latest merge of the two projects (Lucene and Solr). Recently, some Solr users have been looking for a way to perform bitwise operations between and integer value and some fields in the Index So, I wrote a Solr QParser plugin to do this using a custom Lucene Filter. This package makes it possible to filter results returned from a query based on the results of a bitwise operation on an integer field in the documents returned from the pre-constructed query. You can perform three basic types of operations on these integer fields * BitwiseOperation.BITWISE_AND (bitwise AND) * BitwiseOperation.BITWISE_OR (bitwise inclusive OR) * BitwiseOperation.BITWISE_XOR (bitwise exclusive OR) You can also negate the results of these operations. For example, imagine there is an integer field in the index named flags with the a value 8 (1000 in binary). The following results will be expected : 1. A source value of 8 will match during a BitwiseOperation.BITWISE_AND operation, with negate set to false. 2. A source value of 4 will match during a BitwiseOperation.BITWISE_AND operation, with negate set to true. The BitwiseFilter constructor accepts the following values * The name of the integer field (A string) * The BitwiseOperation object. Example BitwiseOperation.BITWISE_XOR * The source value (an integer) * A boolean value indicating whether or not to negate the results of the operation * A pre-constructed org.apache.lucene.search.Query Here is an example of how you would use it with Solr http://localhost:8983/solr/bitwise/select/?q={!bitwisefield=user_permissions op=AND source=3 negate=true}state:FL http://localhost:8983/solr/bitwise/select/?q={!bitwisefield=user_permissions op=AND source=3}state:FL Here is an example of how you would use it with Lucene public class BitwiseTestSearch extends BitwiseTestBase { public BitwiseTestSearch() { } public void search() throws IOException, ParseException { setupSearch(); // term Term t = new Term(COUNTRY_KEY, us); // term query Query q = new TermQuery(t); // maximum number of documents to display int limit = 1000; int sourceValue = 0 ; boolean negate = false; BitwiseFilter bitwiseFilter = new BitwiseFilter(USER_PERMS_KEY, BitwiseOperation.BITWISE_XOR, sourceValue, negate, q); Query fq = new FilteredQuery(q, bitwiseFilter); ScoreDoc[] hits = isearcher.search(fq, null, limit).scoreDocs; BitwiseResultFilter resultFilter = bitwiseFilter.getResultFilter(); for (int i = 0; i hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); System.out.println(FIRST_NAME_KEY + field has a value of + hitDoc.get(FIRST_NAME_KEY)); System.out.println(LAST_NAME_KEY + field has a value of + hitDoc.get(LAST_NAME_KEY)); System.out.println(ACTIVE_KEY + field has a value of + hitDoc.get(ACTIVE_KEY)); System.out.println(USER_PERMS_KEY + field has a value of + hitDoc.get(USER_PERMS_KEY)); System.out.println(doc ID -- + hits[i].doc); System.out.println(...); } System.out.println(sourceValue = + sourceValue + ,operation = + resultFilter.getOperation().getOperationName() + , negate = + negate); System.out.println(A total of + hits.length + documents were found from the search\n); shutdown(); } public static void main(String args[]) throws IOException, ParseException { BitwiseTestSearch search = new BitwiseTestSearch(); search.search(); } } Any guidance would be highly appreciated. Thanks. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Bitwise Operations on Integer Fields in Lucene and Solr Index
I have created two ISSUES as new features https://issues.apache.org/jira/browse/LUCENE-1560 https://issues.apache.org/jira/browse/SOLR-1913 The first one is for the Lucene Filter. The second one is for the Solr QParserPlugin The source code and jar files are attached and the Solr plugin is available for use immediately. On Thu, May 13, 2010 at 6:42 PM, Andrzej Bialecki a...@getopt.org wrote: On 2010-05-13 23:27, Israel Ekpo wrote: Hello Lucene and Solr Community I have a custom org.apache.lucene.search.Filter that I would like to contribute to the Lucene and Solr projects. So I would need some direction as to how to create and ISSUE or submit a patch. It looks like there have been changes to the way this is done since the latest merge of the two projects (Lucene and Solr). Recently, some Solr users have been looking for a way to perform bitwise operations between and integer value and some fields in the Index So, I wrote a Solr QParser plugin to do this using a custom Lucene Filter. This package makes it possible to filter results returned from a query based on the results of a bitwise operation on an integer field in the documents returned from the pre-constructed query. Hi, What a coincidence! :) I'm working on something very similar, only the use case that I need to support is slightly different - I want to support a ranked search based on a bitwise overlap of query value and field value. That is, the number of differing bits would reduce the score. This scenario occurs e.g. during near-duplicate detection that uses fuzzy signatures, on document- or sentence levels. I'm going to submit my code early next week, it still needs some polishing. I have two ways to execute this query, neither of which uses filters at the moment: * method 1: during indexing the bits in the fields are turned into on/off terms on the same field, and during search a BooleanQuery is formed from the int value with the same terms. Scoring is courtesy of BooleanScorer. This method supports only a single int value per field. * method 2, incomplete yet - during indexing the bits are turned into terms as before, but this method supports multiple int values per field: terms that correspond to bitmasks on the same value are put at the same positions. Then a specialized Query / Scorer traverses all 32 posting lists in parallel, moving through all matching docs and scoring according to how many terms matched at the same position. I wrapped this in a Solr FieldType, and instead of using a custom QParser plugin I simply implemented FieldType.getFieldQuery(). It would be great to work out a convenient user-level API for this feature, both the scoring and the non-scoring case. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Best way to handle bitfields in solr...
William, This QParserPlugin should solve that problem now. Check out https://issues.apache.org/jira/browse/SOLR-1913 BitwiseQueryParserPlugin is a org.apache.solr.search.QParserPlugin that allows users to filter the documents returned from a query by performing bitwise operations between a particular integer field in the index and the specified value. The plugin is available immediately for your use. On Fri, Dec 4, 2009 at 4:03 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Would http://wiki.apache.org/solr/FunctionQuery#fieldvalue help? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: William Pierce evalsi...@hotmail.com To: solr-user@lucene.apache.org Sent: Fri, December 4, 2009 2:43:25 PM Subject: Best way to handle bitfields in solr... Folks: In my db I currently have fields that represent bitmasks. Thus, for example, a value of the mask of 48 might represent an undergraduate (value = 16) and graduate (value = 32). Currently, the corresponding field in solr is a multi-valued string field called EdLevel which will have Undergraduate and Graduate as its two values (for this example). I do the conversion from the int into the list of values as I do the indexing. Ideally, I'd like solr to have bitwise operations so that I could store the int value, and then simply search by using bit operations. However, given that this is not possible, and that there have been recent threads speaking to performance issues with multi-valued fields, is there something better I could do? TIA, - Bill -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Bitwise Operations on Integer Fields in Lucene and Solr Index
Correction, I meant to list https://issues.apache.org/jira/browse/LUCENE-2460 https://issues.apache.org/jira/browse/SOLR-1913 On Thu, May 13, 2010 at 10:13 PM, Israel Ekpo israele...@gmail.com wrote: I have created two ISSUES as new features https://issues.apache.org/jira/browse/LUCENE-1560 https://issues.apache.org/jira/browse/SOLR-1913 The first one is for the Lucene Filter. The second one is for the Solr QParserPlugin The source code and jar files are attached and the Solr plugin is available for use immediately. On Thu, May 13, 2010 at 6:42 PM, Andrzej Bialecki a...@getopt.org wrote: On 2010-05-13 23:27, Israel Ekpo wrote: Hello Lucene and Solr Community I have a custom org.apache.lucene.search.Filter that I would like to contribute to the Lucene and Solr projects. So I would need some direction as to how to create and ISSUE or submit a patch. It looks like there have been changes to the way this is done since the latest merge of the two projects (Lucene and Solr). Recently, some Solr users have been looking for a way to perform bitwise operations between and integer value and some fields in the Index So, I wrote a Solr QParser plugin to do this using a custom Lucene Filter. This package makes it possible to filter results returned from a query based on the results of a bitwise operation on an integer field in the documents returned from the pre-constructed query. Hi, What a coincidence! :) I'm working on something very similar, only the use case that I need to support is slightly different - I want to support a ranked search based on a bitwise overlap of query value and field value. That is, the number of differing bits would reduce the score. This scenario occurs e.g. during near-duplicate detection that uses fuzzy signatures, on document- or sentence levels. I'm going to submit my code early next week, it still needs some polishing. I have two ways to execute this query, neither of which uses filters at the moment: * method 1: during indexing the bits in the fields are turned into on/off terms on the same field, and during search a BooleanQuery is formed from the int value with the same terms. Scoring is courtesy of BooleanScorer. This method supports only a single int value per field. * method 2, incomplete yet - during indexing the bits are turned into terms as before, but this method supports multiple int values per field: terms that correspond to bitmasks on the same value are put at the same positions. Then a specialized Query / Scorer traverses all 32 posting lists in parallel, moving through all matching docs and scoring according to how many terms matched at the same position. I wrapped this in a Solr FieldType, and instead of using a custom QParser plugin I simply implemented FieldType.getFieldQuery(). It would be great to work out a convenient user-level API for this feature, both the scoring and the non-scoring case. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
[PECL-DEV] [ANNOUNCEMENT] solr-0.9.10 (beta) Released
The new PECL package solr-0.9.10 (beta) has been released at http://pecl.php.net/. Release notes - - Increased compatibility with older systems running CentOS 4 or 5 and RHEL4 or 5 - Added ability to compile directly without having to build libcurl and libxml2 from source on older systems - Lowered minimum supported version for libcurl to 7.15.0 (Alex Samorukov) - Lowered minimum supported version for libxml2 to 2.6.26 (Alex Samorukov) - Fixed PECL Bug# 17172 MoreLikeThis only parses one doc (trevor at blubolt dot com, max at blubolt dot com) - Declared workaround macros for SSL private key constants due to support for earlier versions of libcurl (Alex Samorukov) - Changed extension version numbers to start using hexadecimal numbers (Israel Ekpo) - Added instructions on how to attempt to compile on windows (Israel Ekpo) - Fixed PECL Bug# 17292 sending UTF-8 encoding in header (giguet at info dot unicaen dot fr) Package Info - It effectively simplifies the process of interacting with Apache Solr using PHP5 and it already comes with built-in readiness for the latest features added in Solr 1.4. The extension has features such as built-in, serializable query string builder objects which effectively simplifies the manipulation of name-value pair request parameters across repeated requests. The response from the Solr server is also automatically parsed into native php objects whose properties can be accessed as array keys or object properties without any additional configuration on the client-side. Its advanced HTTP client reuses the same connection across multiple requests and provides built-in support for connecting to Solr servers secured behind HTTP Authentication or HTTP proxy servers. It is also able to connect to SSL-enabled containers. Please consult the documentation for more details on features. Related Links - Package home: http://pecl.php.net/package/solr Changelog: http://pecl.php.net/package-changelog.php?package=solr Download: http://pecl.php.net/get/solr-0.9.10.tgz Documentation: http://www.php.net/solr Authors - Israel Ekpo ie...@php.net (lead)
Re: Evangelism
Checkout Lucid Imagination http://www.lucidimagination.com/About-Search This should convince you. On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman da...@hostworks.comwrote: Hi I'm new to the list here, I'd like to steer someone in the direction of Solr, and I see the list of companies using solr, but none have a power by solr logo or anything. Does anyone have any great links with evidence to majorly successful solr projects? Thanks in advance, Dan B. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Evangelism
Their main search page has the Powered by Solr logo http://www.lucidimagination.com/search/ On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo israele...@gmail.com wrote: Checkout Lucid Imagination http://www.lucidimagination.com/About-Search This should convince you. On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman da...@hostworks.comwrote: Hi I'm new to the list here, I'd like to steer someone in the direction of Solr, and I see the list of companies using solr, but none have a power by solr logo or anything. Does anyone have any great links with evidence to majorly successful solr projects? Thanks in advance, Dan B. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Evangelism
A lot of high performing websites use MySQL, Oracle and Microsoft SQL Server for data storage and other RDBMS needs without necessarily putting the powered by logo on the sites. If you need the certified version of Apache Solr, you can contact Lucid Imagination. Just like MySQL, Apache Solr and Apache Lucene also have commercial backing (from Lucid Imagination) if you choose to go that route. On Thu, Apr 29, 2010 at 2:24 PM, Nagelberg, Kallin knagelb...@globeandmail.com wrote: I had a very hard time selling Solr to business folks. Most are of the mind that if you're not paying for something it can't be any good. That might also be why they refrain from posting 'powered by solr' on their website, as if it might show them to be cheap. They are also fearful of lack of support should you get hit by a bus. This might be remedied by recommending professional services from a company such as lucid imagination. I think your best bet is to create a working demo with your data and show them the performance. Cheers, -Kallin Nagelberg -Original Message- From: Israel Ekpo [mailto:israele...@gmail.com] Sent: Thursday, April 29, 2010 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Evangelism Their main search page has the Powered by Solr logo http://www.lucidimagination.com/search/ On Thu, Apr 29, 2010 at 2:18 PM, Israel Ekpo israele...@gmail.com wrote: Checkout Lucid Imagination http://www.lucidimagination.com/About-Search This should convince you. On Thu, Apr 29, 2010 at 2:10 PM, Daniel Baughman da...@hostworks.com wrote: Hi I'm new to the list here, I'd like to steer someone in the direction of Solr, and I see the list of companies using solr, but none have a power by solr logo or anything. Does anyone have any great links with evidence to majorly successful solr projects? Thanks in advance, Dan B. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Tutorials for developing filter plugins.
He is referring to the org.apache.lucene.search.Filter classes. Michael, I did a search too and I could not really find any useful tutorials on the subject. You can take a look at how this is implemented in the Spatial Solr Plugin by the JTeam http://www.jteam.nl/news/spatialsolr.html Their code, I believe, uses the bits() method which has been deprecated in Lucene 2.9 and removed in 3.0. The getDocIdSet() method returns a BitSet object which you can prepare from org.apache.lucene.util.OpenBitSet I think there is probably some example in the new version (2nd Edition) of the *Lucene In Action *book on how to do something similar. You should check it out from the Manning Early Access Program page. http://www.manning.com/hatcher3/ You should also check out the Solr 1.5 source code for how some of the Lucene Filter classes are designed. On Sat, Apr 10, 2010 at 5:23 AM, MitchK mitc...@web.de wrote: Hi Michael, do you mean a TokenFilter like StopWordFilter? If you like, you could post some code, so one can help you. It's really easy to develop some TokenFilters, if you have a look at already implemented ones. Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Tutorials-for-developing-filter-plugins-tp706874p709897.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: add/update document as distinct operations? Is it possible?
Chris, I don't see anything in the headers suggesting that Julian's message was a hijack of another thread On Thu, Apr 1, 2010 at 2:17 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: add/update document as distinct operations? Is it possible? : References: : dc9f7963609bed43b1ab02f3ce52863103dc35f...@bene-exch-01.benetech.local : In-Reply-To: : dc9f7963609bed43b1ab02f3ce52863103dc35f...@bene-exch-01.benetech.local http://people.apache.org/~hossman/#threadhijackhttp://people.apache.org/%7Ehossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: selecting documents older than 4 hours
I did something similar. The only difference with my set up is that I have two columns; one that store the dates the document was first created and a second that stores the date it was last updated as unix time stamps So my query to find documents that are older than 4 hours would be very easy To find documents that were last updated more than for hours ago you would do something like this q=last_update_date:[* TO 1270119278] The current timestamp now is 1270133678. 4 hours ago was 1270119278 The column types in the schema is tint On Wed, Mar 31, 2010 at 11:18 PM, herceg_novi herceg_n...@yahoo.com wrote: Hello, I'd like to select documents older than 4 hours in my Solr 1.4 installation. The query q=last_update_date:[NOW-7DAYS TO NOW-4HOURS] does not return a correct recordset. I would expect to get all documents with last_update_date in the specified range. Instead solr returns all documents that exist in the index which is not what I would expect. Last_update_date is SolrDate field. This does not work either q=last_update_date:[NOW/DAY-7DAYS TO NOW/HOUR-4HOURS] This works, but I manually had to calculate the 4 hour difference and insert solr date formated timestamp into my query (I prefer not to do that) q=last_update_date:[NOW/DAY-7DAYS TO 2010-03-31T19:40:34Z] Any ideas if I can get this to work as expected? q=last_update_date:[NOW-7DAYS TO NOW-4HOURS] Thanks! -- View this message in context: http://n3.nabble.com/selecting-documents-older-than-4-hours-tp689975p689975.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Features not present in Solr
On Mon, Mar 22, 2010 at 3:16 PM, Lance Norskog goks...@gmail.com wrote: Web crawling. I don't think Solr was designed with Web Crawling in mind. Nutch would be more better suited for that, I believe. Text analysis. This is a bit vague. Please elaborate further. There is a lot of analysis (stemming, stop-word removal, character transformation etc) that takes place already though implicitly based on what fields you define and use in the schema. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Distributed index management. A fanatical devotion to the Pope. There a probably a lot of features already available in Solr out of the box that most of those other enterprise level applications do not have yet. You would also be surprised to learn that a lot of them use Lucene under the covers and are actually trying to re-implement what is already available in Solr. On Sun, Mar 21, 2010 at 11:19 PM, MitchK mitc...@web.de wrote: Srikanth, I don't know anything about Endeca, so I can't compare Solr to it. However, I know Solr is powerful. Very powerful. So, maybe you should tell us more about your needs to get a good answer. As a response to your second question: You should not expect that Solr is a database. It is an index-server. A database makes your data save. If there goes something wrong - which is always possible - Solr gives no warranties. Maybe someone other can tell you more about this topic. - Mitch Srikanth B wrote: Hello We are in the process of researching on Solr features. I am looking for two things 1. Features not available in Solr but present in other products like Endeca 2. What one shouldn't not expect from Solr Any thoughts ? Thanks in advance Srikanth -- View this message in context: http://old.nabble.com/Features-not-present-in-Solr-tp27966315p27982734.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Features not present in Solr
One feature that is not available in Solr is any licensing fees and fine print. Also you should not expect to pay in order to use Solr. On Fri, Mar 19, 2010 at 11:16 PM, Srikanth B srikanth...@gmail.com wrote: Hello We are in the process of researching on Solr features. I am looking for two things 1. Features not available in Solr but present in other products like Endeca 2. What one shouldn't not expect from Solr Any thoughts ? Thanks in advance Srikanth -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Search on dynamic fields which contains spaces /special characters
I do not believe the SOLR or LUCENE syntax allows this You need to get rid of all the spaces in the field name If not, then you will be searching for short in the default field and then name1 in the name field. http://wiki.apache.org/solr/SolrQuerySyntax http://lucene.apache.org/java/2_9_2/queryparsersyntax.html On Mon, Mar 8, 2010 at 2:17 PM, JavaGuy84 bbar...@gmail.com wrote: Hi, We have some dynamic fields getting indexed using SOLR. Some of the dynamic fields contains spaces / special character (something like: short name, Full Name etc...). Is there a way to search on these fields (which contains the spaces etc..). Can someone let me know the filter I need to pass to do this type of search? I tried with short name:name1 -- this didnt work.. Thanks, Barani -- View this message in context: http://old.nabble.com/Search-on-dynamic-fields-which-contains-spaces--special-characters-tp27826147p27826147.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: get Server Status, TotalDocCount .... PHP !
The last time I tried using SolrPHPClient for this stuff, it did not really handle the response very well because of the JSON response generated on the server side. I am not sure if anything has changed since then. The JSON code generated could not be parsed properly. If you do not want to analyze the xml response each time and if you are not using the pecl extension you will need to send a request manually to the solr server using CURL and you have to specify the response format as phps On Tue, Mar 2, 2010 at 9:59 AM, stocki st...@shopgate.com wrote: Hey- No i use the SolrPHPClient http://code.google.com/p/solr-php-client/ i not really want tu use two different php-libs. ^^ what do you mean with unserialize ? XD Guillaume Rossolini-2 wrote: Hi Have you tried the php_solr extension from PECL? It has a handy SolrPingResponse class. Or you could just call the CORENAME/admin/ping?wt=phps URL and unserialize it. Regards, -- I N S T A N T | L U X E - 44 rue de Montmorency | 75003 Paris | France Tél. : 01 80 50 52 51 | Mob. : 06 09 96 10 29 | web : www.instantluxe.com On Tue, Mar 2, 2010 at 2:50 PM, stocki st...@shopgate.com wrote: hello I use Solr in my cakePHP Framework. How can i get status information of my solr cores ?? I dont want analyze everytime the responseXML. do anybody know a nice way to get status messages from solr ? thx ;) Jonas -- View this message in context: http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756118.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/get-Server-Status%2C-TotalDocCount--PHP-%21-tp27756118p27756852.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: updating particular field
Unfortunately, because of how Lucene works internally, you will not be able to update just one or two fields. You have to resubmit the entire document. If you only send just one or two fields, then the updated document will only have the fields sent in the last update. On Mon, Mar 1, 2010 at 7:09 AM, Suram reactive...@yahoo.com wrote: Siddhant wrote: Yes. You can just re-add the document with your changes, and the rest of the fields in the document will remain unchanged. On Mon, Mar 1, 2010 at 5:09 PM, Suram reactive...@yahoo.com wrote: Hi, doc field name=idEN7800GTX/2DHTV/256M/field field name=manuASUS Computer Inc./field field name=catelectronics/field field name=catgraphics card/field field name=featuresNVIDIA GeForce 7800 GTX GPU/VPU clocked at 486MHz/field field name=features256MB GDDR3 Memory clocked at 1.35GHz/field field name=price479.95/field field name=popularity7/field field name=inStockfalse/field field name=manufacturedate_dt2006-02-13T15:26:37Z/DAY/field /doc can i possible to update field name=inStocktrue/field without affect any field of my previous document Thanks in advance -- View this message in context: http://old.nabble.com/updating-particular-field-tp27742399p27742399.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Siddhant Hi, Here i don't want to reload entire data just i want u update a field i need to change(ie one or more field with id not whole) -- View this message in context: http://old.nabble.com/updating-particular-field-tp27742399p27742671.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: If you could have one feature in Solr...
Grant, One feature that I would like to see is the ability to do a Bitwise search I have had to work around this with a Query Parser plugin that uses a org.apache.lucene.search.Filter I think having this feature would be very nice and I prefer it to searching with multiple OR type queries especially when the bit are known ahead of time. I can submit the code as a patch once I get the approval to do so. On Wed, Feb 24, 2010 at 2:20 PM, straup str...@gmail.com wrote: I actually found the documentation pretty great especially since (my experience, anyway) most Java projects seem to default to generic JavaDoc derived documentation (and that makes me cry). That said, more cookbook-style recipes or stories would be helpful for some of the more esoteric parts of Solr. Also: real-time indexing and geo. Cheers, On 2/24/10 9:54 AM, Grant Ingersoll wrote: On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote: Decent documentation. What parts do you feel are lacking? Or is it just across the board? Wikis are both good and bad for documentation, IMO. -Grant -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Question about custom Lucene filters and Solr
Hi Jon, You will need to write a plugin You will need custom Query parser and an Update Handler depending on what you are doing. The implementation of an Update Handler or Update Request Processor is not recommended because it is considered to be advanced. Take a look at the following links for more information http://wiki.apache.org/solr/SolrPlugins http://wiki.apache.org/solr/UpdateRequestProcessor http://lucene.apache.org/solr/api/org/apache/solr/update/UpdateHandler.html http://lucene.apache.org/solr/api/org/apache/solr/search/QParserPlugin.html http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessor.html On Tue, Feb 16, 2010 at 2:43 PM, Jon Bodner jon.bod...@it.com wrote: Hello, I'm interested in using Solr with a custom Lucene Filter (like the one described in section 6.4.1 of the Lucene In Action, Second Edition book). I'd like to filter search results from a Lucene index against information stored in a relational database. I don't want to move the relational database information into the search index, because it could change frequently. I looked at writing my own custom Solr SearchComponent, but the documentation for those seems slim. Is this the correct approach? Is there another way? Thanks, Jon -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Basic questions about Solr cost in programming time
On Tue, Jan 26, 2010 at 3:00 PM, Jeff Crump jcr...@hq.mercycorps.orgwrote: Hi, I hope this message is OK for this list. I'm looking into search solutions for an intranet site built with Drupal. Eventually we'd like to scale to enterprise search, which would include the Drupal site, a document repository, and Jive SBS (collaboration software). I'm interested in Lucene/Solr because of its scalability, faceted search and optimization features, and because it is free. Our problem is that we are a non-profit organization with only three very busy programmers/sys admins supporting our employees around the world. To help me argue for Solr in terms of total cost, I'm hoping that members of this list can share their insights about the following: * About how many hours of programming did it take you to set up your instance of Lucene/Solr (not counting time spent on optimization)? For me this generally took 30 to 70 hours to create the entire search application depending on the features on the web application and the complexity of the site. * Are there any disadvantages of going with a certified distribution rather than the standard distribution? The people at Lucid Imagination can probably provide a better answer for this. It is not really a disadvantage to go with the certified version but you may have to pay in order to get the certified distribution. However, you will get dedicated support if you happen to run into any issues or need technical assistance. If you use the standard version you can always get help from the mailing list if you have any issues. Thanks and best regards, Jeff Jeff Crump jcr...@hq.mercycorps.org -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: What is this error means?
Ellery, A preliminary look at the source code indicates that the error is happening because the solr server is taking longer than expected to respond to the client http://code.google.com/p/solr-php-client/source/browse/trunk/Apache/Solr/Service.php The default time out handed down to Apache_Solr_Service:_sendRawPost() is 60 seconds since you were calling the addDocument() method So if it took longer than that (1 minute), then it will exit with that error message. You will have to increase the default value to something very high like 10 minutes or so on line 252 in the source code since there is no way to specify that in the constructor or the addDocument method. Another alternative will be to update the default_socket_timeout in the php.ini file or in the code using ini_set I hope that helps On Tue, Jan 12, 2010 at 9:33 PM, Ellery Leung elleryle...@be-o.com wrote: Hi, here is the stack trace: br / Fatal error: Uncaught exception 'Exception' with message 'quot;0quot; Status: Communication Error' in C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Serv ice.php:385 Stack trace: #0 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(652): Apache_Solr_Ser vice-gt;_sendRawPost('http://127.0.0', 'lt;add allowDups=...') #1 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(676): Apache_Solr_Ser vice-gt;add('lt;add allowDups=...') #2 C:\nginx\html\apps\milio\lib\System\classes\SolrSearchEngine.class.php(221): Apache_Solr_Service-gt;addDocument(Object(Apache_Solr_Document)) #3 C:\nginx\html\apps\milio\lib\System\classes\SolrSearchEngine.class.php(262): SolrSearchEngine-gt;buildIndex(Array, 'key') #4 C:\nginx\html\apps\milio\lib\System\classes\Indexer\Indexer.class.php(51): So lrSearchEngine-gt;createFullIndex('contacts', Array, 'key', 'www') #5 C:\nginx\html\apps\milio\lib\System\functions\createIndex.php(64): Indexer-g t;create('www') #6 {main} thrown in C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php on li ne 385br / C:\nginx\html\apps\milio\htdocs\Contactspause Press any key to continue . . . Thanks for helping me. Grant Ingersoll-6 wrote: Do you have a stack trace? On Jan 12, 2010, at 2:54 AM, Ellery Leung wrote: When I am building the index for around 2 ~ 25000 records, sometimes I came across with this error: Uncaught exception Exception with message '0' Status: Communication Error I search Google Yahoo but no answer. I am now committing document to solr on every 10 records fetched from a SQLite Database with PHP 5.3. Platform: Windows 7 Home Web server: Nginx Solr Specification Version: 1.4.0 Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:33:40 Lucene Specification Version: 2.9.1 Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25 Solr hosted in jetty 6.1.3 All the above are in one single test machine. The situation is that sometimes when I build the index, it can be created successfully. But sometimes it will just stop with the above error. Any clue? Please help. Thank you in advance. -- View this message in context: http://old.nabble.com/What-is-this-error-means--tp27123815p27138658.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
[PECL-DEV] [ANNOUNCEMENT] solr-0.9.9 (beta) Released
The new PECL package solr-0.9.9 (beta) has been released at http://pecl.php.net/. Release notes - - Fixed Bug #17009 Creating two SolrQuery objects leads to wrong query value - Reset the buffer for the request data from the previous request in SolrClient - Added new internal static function solr_set_initial_curl_handle_options() - Moved the intialization of CURL handle options to solr_set_initial_curl_handle_options() function - Resetting the CURL options on the (CURL *) handle after each request is completed - Added more explicit error message to indicate that cloning SolrParams objects and its descendants is currently not yet supported Package Info - It effectively simplifies the process of interacting with Apache Solr using PHP5 and it already comes with built-in readiness for the latest features available in Solr 1.4. The extension has features such as built-in, serializable query string builder objects which effectively simplifies the manipulation of name-value pair request parameters across repeated requests. The response from the Solr server is also automatically parsed into native php objects whose properties can be accessed as array keys or object properties without any additional configuration on the client-side. Its advanced HTTP client reuses the same connection across multiple requests and provides built-in support for connecting to Solr servers secured behind HTTP Authentication or HTTP proxy servers. It is also able to connect to SSL-enabled containers. Please consult the documentation for more details on features. Related Links - Package home: http://pecl.php.net/package/solr Changelog: http://pecl.php.net/package-changelog.php?package=solr Download: http://pecl.php.net/get/solr-0.9.9.tgz Documentation: http://us.php.net/solr Authors - Israel Ekpo ie...@php.net (lead)
Re: Help with creating a solr schema
On Thu, Dec 31, 2009 at 10:26 AM, JaredM emru...@gmail.com wrote: Hi, I'm new to Solr but so far I think its great. I've spent 2 weeks reading through the wiki and mailing list info. I have a use case and I'm not sure what the best way is to implement it. I am keeping track of peoples calendar schedules in a really simple way: each user can login and input a number of date ranges where they are available (so for example - User Alice might be available between 1-Jan-2010 - 15-Jan-2010 and 20-Feb-2010 - 22-Feb-2010 and 1-Mar-2010-5-Mar-2010. In my data model I have this modelled as a one-to-many with a User table (consisting of username, some metadata) and an Availability table (consisting of start date and end date). Now I need to search which users are available between a given date range. The bit I'm having trouble with is how to store multiple start end date pairs. Can someone provide some guidance? -- View this message in context: http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26979319.html Sent from the Solr - User mailing list archive at Nabble.com. I have done something similar to this before. You will have to store the username, firstname, lastname as single valued fields field name=username type=string indexed=true stored=true required=true/ field name=firstname type=string indexed=true stored=true / field name=lastname type=string indexed=true stored=true / field name=start_date type=tint indexed=true stored=true multiValued=true/ field name=end_date type=tint indexed=true stored=true multiValued=true/ However, the start and end dates should be multivalued tint types. I decided to store the dates as UNIX timestamps. The start dates are stored as the unix timestamps at 12 midnight of that date (00:00:00) The end dates are stored as the unix time stamps at 11:59:59 PM on the end date 23:59:59 This (storing the dates as Trie integers) gave me faster range query results. when searching you will also have to convert the dates to unix time stamps using similar logic before using it in the solr search query You should use the username of the user as the uniqueKey. If a user has multiple dates of availability you will enter it like so: add doc field name=usernameexampleun/field field name=firstnameexamplefnf/field field name=lastnameexampleln/field field name=start_date137865661/field field name=start_date137865662/field field name=start_date137865663/field field name=end_date137865681/field field name=end_date137865682/field field name=end_date137865683/field /doc /add -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: solr 1.4 csv import -- Document missing required field: id
On Fri, Jan 1, 2010 at 9:13 PM, evana evre...@ymail.com wrote: Hi, I am trying to import a csv file (without id field) on solr 1.4 In schema.xml id field is set with required=false. But I am getting org.apache.solr.common.SolrException: Document missing required field: id Following is the schema.xml fields section fields field name=id type=string indexed=true stored=true required=false / field name=name type=textgen indexed=true stored=true/ field name=text type=text indexed=true stored=true multiValued=true/ dynamicField name=ignored_* type=ignored multiValued=true/ dynamicField name=random_* type=random / dynamicField name=* type=string indexed=true/ /fields uniqueKeyid/uniqueKey Following is the csv file company_id,customer_name,active 58,Apache,Y 58,Solr,Y 58,Lucene,Y 60,IBM,Y Following is the solrj import client SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/solr;); ContentStreamUpdateRequest req = new ContentStreamUpdateRequest(/update/csv); req.addFile(new File(filename)); req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedList result = server.request(req); System.out.println(Result: + result); Could any of you help out please. Thanks -- View this message in context: http://old.nabble.com/solr-1.4-csv-import-Document-missing-required-field%3A-id-tp26990048p26990048.html Sent from the Solr - User mailing list archive at Nabble.com. The presence of the uniqueKey definition implicitly implies that the id field is a required field in the document even tough the attribute is set to false on the field definition. Try removing the uniqueKey definition for the id field in the schema.xml file and then try again to run the update script or application. The uniqueKey definition is not needed if you are going to build the index from scratch each time you do the import. However, if you are doing incremental updates, this field is required and the uniqueKey definition is also needed too to specify what the primary key for the doucment is. http://wiki.apache.org/solr/UniqueKey -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Help with creating a solr schema
On Fri, Jan 1, 2010 at 9:47 PM, JaredM emru...@gmail.com wrote: Thanks Ahmet and Israel. I prefer Israel's approach since the amount of metadata for the user is quite high but I'm not clear how to get around one problem: If I had 2 availabilities (I've left it in human-readable form instead of as a UNIX timestamp only for ease of understanding): field name=start_date10-Jan-2010/field field name=start_date20-Jan-2010/field field name=end_date25-Jan-2010/field field name=end_date28-Jan-2010/field and I wanted to query for availability between 12-Jan-2010 to 26-Jan-2010 then then wouldn't the above document be returned (even though the use would not be available 20-25 Jan? -- View this message in context: http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26990178.html Sent from the Solr - User mailing list archive at Nabble.com. Unfortunately, For this particular use case, if you are using the out-of-the-box features available in Solr 1.4, without a custom Solr plugin using a custom Lucene filter and some special value storage arrangement for the fields, you will have to store each start and end date as a separate document. So, there will be N separate documents for each username if that user has N distinct periods of availabilty. The start date and end date fields would also have to be single valued instead of multi-valued as I specified in the earlier post. Sorry. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: How use implement Lucene for perl.
I think you need to send a message to the lucene mailing list instead if you want to use Lucene directly. java-u...@lucene.apache.org The API core Javadoc page has a very simple example which you can use to get started with a few modifications. http://lucene.apache.org/java/3_0_0/api/core/index.html Use the documentation to select the appropriate constructor and method signatures. On the other hand, I think Solr can do everything that you need without the need to interact directly with the lucene API On Mon, Dec 28, 2009 at 11:42 PM, Maheshwar maheshwar2...@gmail.com wrote: I am new for Lucene. I haven't any idea about Lucene. I want to implement Lucene in my search script. Please guide me what I needs to be do for Lucene implementation. Actually, I want to integrate lucene search with message board system where people come to post new topic, edit that topic and delete that on needs. I want, to update search index at every action. So I need some valuable help. -- View this message in context: http://old.nabble.com/How-use-implement-Lucene-for-perl.-tp26951130p26951130.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: NOT highlighting synonym
I think what Erik was referring to was for you to create a separate copy field with different analyzers and just copy the original value to that copy field and index it differently. That way you can use one field for search and another one to display the highlighting results. On Mon, Dec 28, 2009 at 1:00 PM, darniz rnizamud...@edmunds.com wrote: Thanks Unfortunately thats not the case. We are using the same field to do search on and display that text. So looks like in this case this is not possible Am i correct We have a custom field type with synonyms defined at query time. Erik Hatcher-4 wrote: On Dec 23, 2009, at 2:26 PM, darniz wrote: i have a requirement where we dont want to hightlight synonym matches. for example i search for caddy and i dont want to highlight matched synonym like cadillac. Looking at highlighting parameters i didn't find any support for this. anyone can offer any advice. You can control what gets highlighted by which analyzer is used. You may need a different field for highlighting than you use for searching in this case - but you can just create another field type without the synonym filter in it and use that for highlighting. Erik -- View this message in context: http://old.nabble.com/NOT-highlighting-synonym-tp26906321p26945921.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: solr php client vs file_get_contents?
On Tue, Dec 15, 2009 at 8:49 AM, Faire Mii faire@gmail.com wrote: i am using php to access solr and i wonder one thing. why should i use solr php client when i can use $serializedResult = file_get_contents('http://localhost:8983/solr/ select?q=niklaswt=phps'); to get the result in arrays and then print them out? i dont really get the difference. is there any richer features with the php client? regards fayer Hi Faire, Have you actually used this library before? I think the library is pretty well thought out. From a simple glance at the source code you can see that one can use it for the following purposes: 1. Adding documents to the index (which you cannot just do with file_get_contents alone). So that's one diff 2. Updating existing documents 3. Deleting existing documents. 4. Balancing requests across multiple backend servers There are other operations with the Solr server that the library can also perform. Some example of what I am referring to is illustrated here http://code.google.com/p/solr-php-client/wiki/FAQ http://code.google.com/p/solr-php-client/wiki/ExampleUsage IBM also has an interesting article illustrating how to add documents to the Solr index and issue commit and optimize calls using this library. http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/ The author of the library can probably give you more details on what the library has to offer. I think you should download the source code and spend some time looking at all the features it has to offer. In my opinion, it is not fair to compare a well thought out library like that with a simple php function. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Can solr web site have multiple versions of online API doc?
2009/12/15 Teruhiko Kurosaka k...@basistech.com Lucene keeps multiple versions of its API doc online at http://lucene.apache.org/java/X_Y_Z/api/all/index.html for version X.Y.Z. I am finding this very useful when comparing different versions. This is also good because the javadoc comments that I write for my software can reference the API comments of the exact version of Lucene that I am using. At Solr site, I can only find the API doc of the trunk build. I cannot find 1.3.0 API doc, for example. Can Solr site also maintain the API docs for the past stable versions ? -kuro Hi Teruhiko If you downloaded the 1.3.0 release, you should find a docs folder inside the zip file. This contains the javadoc for that particular release. You may also re download a 1.3.0 release to get the docs for Solr 1.3. I hope this helps. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: apache-solr-common.jar
2009/12/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com there is no solrcommon jar anymore. you may use the solrj jar which contains all the classes which were there in the comon jar. On Mon, Dec 14, 2009 at 9:22 PM, gudumba l gudumba.sm...@gmail.com wrote: Hello All, I have been using apache-solr-common-1.3.0.jar in my module. I am planning to shift to the latest version, because of course it has more flexibility. But it is really strange that I dont find any corresponding jar of the latest version. I have serached in total apachae solr 1.4 folder (which is downloaded from site), but have not found any. , I am sorry, its really silly to request for a jar, but have no option. Thanks. -- - Noble Paul | Systems Architect| AOL | http://aol.com I had the same question too earlier last week and I found out after some research where the packages are bundled. The specific jar is the dist folder as apache-solr-1.4.0/dist/apache-solr-solrj-1.4.0.jar This was where I found the classes in the org.apache.solr.common.* packages -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Sol server is not set up ??
On Fri, Dec 11, 2009 at 7:54 AM, regany re...@newzealand.co.nz wrote: Hello! I'm trying to successfully build/install the PHP Solr Extension, but am running into an error when doing a make test - the following 4 tests fail, the other 17 pass. The Solr server is definately running because I can access it via the admin URL. Anyone know what else may be causing the make test to think teh solr server is not set up??? regan = Running selected tests. TEST 1/21 [tests/solrclient_001.phpt] SKIP SolrClient::addDocument() - Sending a single document to the Solr server [tests/solrclient_001.phpt] reason: Solr server is not set up TEST 2/21 [tests/solrclient_002.phpt] SKIP SolrClient::addDocuments() - sending multiple documents to the Solr server [tests/solrclient_002.phpt] reason: Solr server is not set up TEST 3/21 [tests/solrclient_003.phpt] SKIP SolrClient::addDocuments() - sending a cloned document [tests/solrclient_003.phpt] reason: Solr server is not set up TEST 4/21 [tests/solrclient_004.phpt] SKIP SolrClient::query() - Sending a chained query request [tests/solrclient_004.phpt] reason: Solr server is not set up -- View this message in context: http://old.nabble.com/Sol-server-is-not-set-uptp26743824p26743824.html Sent from the Solr - User mailing list archive at Nabble.com. Hi Regan, This is Israel, the author of the PHP extension. There is nothing wrong with your Solr server, it is just a configuration that you have to change in the test_config.php file before running the make test command. In the tests/test_config.php file you will have to change the value of * SOLR_SERVER_CONFIGURED* from *false* to* true*. You can the contents of the file here in the repository http://svn.php.net/viewvc/pecl/solr/trunk/tests/test.config.php?revision=290120view=markup You also have to specify the correct values for the host name and port numbers. I am going to make some changes to the README files, the test scripts other documentations to make sure that this part is clear (why some tests may be skipped). These changes should be available in the next update release early next week. So, please make these changes and try again. It should not be skipped this time. Also, I would like to know the version of the Solr extension, the PHP version and the operating system you are using. Please let me know if you need any help. Sincerely, Israel Ekpo -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: SolrClient::query(): Solr HTTP Error : 'Couldn't connect to server'
On Fri, Dec 11, 2009 at 6:49 AM, regany re...@newzealand.co.nz wrote: hi, I've (hopefully correctly) install the solr php extension. But I'm receiving the following error when trying to run my test script: SolrClient::query(): Solr HTTP Error : 'Couldn't connect to server' Any ideas how to figure out why it's giving the error?? regan ?php /* Domain name of the Solr server */ define('SOLR_SERVER_HOSTNAME', 'localhost'); define('SOLR_SERVER_PATH', '/solr/core0'); /* Whether or not to run in secure mode */ define('SOLR_SECURE', false ); /* HTTP Port to connection */ define('SOLR_SERVER_PORT', ((SOLR_SECURE) ? 8443 : 8983)); $options = array( 'hostname' = SOLR_SERVER_HOSTNAME ,'port' = SOLR_SERVER_PORT ,'path' = SOLR_SERVER_PATH ); $client = new SolrClient($options); $query = new SolrQuery(); $query-setQuery('apple'); $query-setStart(0); $query-setRows(50); $query_response = $client-Query($query); print_r($query_response); $respose = $query_response-getResponse(); print_r($response); ? -- View this message in context: http://old.nabble.com/SolrClient%3A%3Aquery%28%29%3A-Solr-HTTP-Error-%3A-%27Couldn%27t-connect-to-server%27-tp26742899p26742899.html Sent from the Solr - User mailing list archive at Nabble.com. Hi Regan, I have the following questions: 0. What version of Apache Solr are you using? 1.3, 1.4, nightly builds? 1. What version of PHP are you using and on what operating system? 2. What version of the Solr extension are you using? 3. Which servlet container are you using for Solr? (Jetty, Tomcat, Glass fish etc) 4. What is the hostname and port numbers and path to Solr? Is your port number 8080 or 8983 All please let me know what the output of $client-getDebug() is. This usually contains very detailed errors of what is happening during the connection. I would be happy to help you troubleshoot any errors you are having. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Indexing content on Windows file shares?
If you are looking to index websites, Nutch would be a better alternative. However, it could be useful for indexing text files. There is documentation here for how to add data to the index http://lucene.apache.org/solr/tutorial.html#Indexing+Data http://wiki.apache.org/solr/#Search_and_Indexing There are some clients here to add data to the index programatically. http://wiki.apache.org/solr/IntegratingSolr On Thu, Dec 10, 2009 at 3:06 PM, Matt Wilkie matt.wil...@gov.yk.ca wrote: Hello, I'm new to Solr, I know nothing about it other than it's been touted in a couple of places as a possible competitor to Google Search Appliance, which is what brought me here. I'm looking for a search engine which can index files on windows shares and websites, and, hopefully, integrate with Active Directory to ensure results are not returned to users who don't have access to those files(s). Can Solr do this? If so where is the documentation for it? Reconnaisance searches of the mailing list and wiki have not turned up anything, so far. thanks, -- matt wilkie Geomatics Analyst Information Management and Technology Yukon Department of Environment 10 Burns Road * Whitehorse, Yukon * Y1A 4Y9 867-667-8133 Tel * 867-393-7003 Fax http://environmentyukon.gov.yk.ca/geomatics/ -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
SolrQuerySyntax : Types of Range Queries in Solr 1.4
Hi Guys, In Lucene 2.9 and Solr 1.4, it is possible to perform inclusive and exclusive range searches with square and curly brackets respectively. However, when I looked at the SolrQuerySyntax, only the the include range search is illustrated. It seems like the examples only talk about the inclusive range searches. http://wiki.apache.org/solr/SolrQuerySyntax Illustrative example: There is a field in the index name 'year' and it contains the following values : 2000, 2004, 2005, 2006, 2007, 2008, 2009, 2010 year:[2005 TO 2009] will match 2005, 2006, 2007, 2008, 2009 [inclusive with square brackets] year:{2005 TO 2009} will only match 2006, 2007, 2008 {exclusive with curly brackets}. The bounds are not included. Is there any other page on the wiki where there are examples of exclusive range searches with curly brackets? If not I would like to know so that I can add some examples to the wiki. Thanks. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: SolrQuerySyntax : Types of Range Queries in Solr 1.4
On Wed, Dec 9, 2009 at 1:13 PM, Yonik Seeley yo...@lucidimagination.comwrote: Solr standard query syntax is an extension of Lucene query syntax, and we reference that on the page: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html -Yonik http://www.lucidimagination.com On Wed, Dec 9, 2009 at 1:08 PM, Israel Ekpo israele...@gmail.com wrote: Hi Guys, In Lucene 2.9 and Solr 1.4, it is possible to perform inclusive and exclusive range searches with square and curly brackets respectively. However, when I looked at the SolrQuerySyntax, only the the include range search is illustrated. It seems like the examples only talk about the inclusive range searches. http://wiki.apache.org/solr/SolrQuerySyntax Illustrative example: There is a field in the index name 'year' and it contains the following values : 2000, 2004, 2005, 2006, 2007, 2008, 2009, 2010 year:[2005 TO 2009] will match 2005, 2006, 2007, 2008, 2009 [inclusive with square brackets] year:{2005 TO 2009} will only match 2006, 2007, 2008 {exclusive with curly brackets}. The bounds are not included. Is there any other page on the wiki where there are examples of exclusive range searches with curly brackets? If not I would like to know so that I can add some examples to the wiki. Thanks. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ Hi Yonik, I saw that. I posted the question because someone asked me how to do the exclusive search where the bounds are excluded. Initially they started with field:[lower-1 TO upper-1] and then I just told them to use curly brackets so when I came to the Solr wiki to do a search I did not see any examples with the curly brackets. For me this was very obvious, but I think it would be nice to add a few examples with curly brackets to the SolrQuerySyntax examples because most people that are using Solr for the very first time may not have heard of or used Lucene before. Just a thought. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: parsing the raw query string?
Hi If you are planning to use Solr via PHP, you can take a look at the Solr PECL extension. http://www.php.net/manual/en/book.solr.php which you can download from here http://pecl.php.net/package/solr There is a SolrQuery class that allows you to build and manage the name-value pair parameters which you can then pass on to the SolrClient object for onward transmission to the Solr server. It is also serializable so you can cache is in the $_SESSION variable to propagate the parameters from page to page accross requests. The SolrQuery class has buillt-in methods to add, update, remove and manage the Facets, Highlighting, MoreLikeThis, Stats, TermsComponents etc. I hope this helps. On Sun, Dec 6, 2009 at 1:25 AM, regany re...@newzealand.co.nz wrote: I've just found solr and am looking at what's involved to work with it. All the examples I've seen only ever use 1 word search terms being implemented as examples, which doesn't help me trying to see how multiple word queries work. It also looks like a hell of a lot of processing needs to be done on the raw query string even before you can pass it to solr (in PHP) - is everyone processing the query string first and building a custom call to solr, or is there a query string parser I've missed somewhere? I can't even find what operators (if any) are able to be used in the raw query string in the online docs (maybe there aren't any??). Any help or points in the right direction would be appreciated. -- View this message in context: http://old.nabble.com/parsing-the-raw-query-string--tp26662578p26662578.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Stopping Starting
On Thu, Dec 3, 2009 at 5:01 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Dec 3, 2009 at 4:57 PM, Lee Smith l...@weblee.co.uk wrote: Hello All I am just starting out today with solr and looking for some advice but I first have a problem. I ran the start command ie. user:~/solr/example$ java -jar start.jar Which worked perfect and started to explore the interface. But my terminal window dropped and I it has stopped working. If i try and restart it Im getting errors and its still not working. error like: 2009-12-03 21:55:41.785::WARN: EXCEPTION java.net.BindException: Address already in use So how can I stop and restart the service ? Try and find the java process and kill it? ps -elf | grep java kill pid If no other Java processes are running under user, then killall java is a quick way to do it (Linux has killall... not sure about other systems). -Yonik http://www.lucidimagination.com On Ubuntu, CentOS and some other linux distros, you can run this command: pkill -f start.jar OR pkill -f java If there are no other java processes running -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
[PECL-DEV] [ANNOUNCEMENT] solr-0.9.8 (beta) Released
The new PECL package solr-0.9.8 (beta) has been released at http://pecl.php.net/. Release notes - - Fixed config.w32 for Windows build support. (Pierre, Pierrick) - Windows .dll now available at http://downloads.php.net/pierre (Pierre) - Fixed Bug #16943 Segmentation Fault from solr_encode_string() during attempt to retrieve solrXmlNode-children-content when solrXmlNode-children is NULL (Israel) - Disabled Expect header in libcurl (Israel) - Disabled Memory Debugging when normal debug is enabled (Israel) - Added list of contributors to the project (README.CONTRIBUTORS) Package Info - It effectively simplifies the process of interacting with Apache Solr using PHP5 and it already comes with built-in readiness for the latest features available in Solr 1.4. The extension has features such as built-in, serializable query string builder objects which effectively simplifies the manipulation of name-value pair request parameters across repeated requests. The response from the Solr server is also automatically parsed into native php objects whose properties can be accessed as array keys or object properties without any additional configuration on the client-side. Its advanced HTTP client reuses the same connection across multiple requests and provides built-in support for connecting to Solr servers secured behind HTTP Authentication or HTTP proxy servers. It is also able to connect to SSL-enabled containers. Please consult the documentation for more details on features. Related Links - Package home: http://pecl.php.net/package/solr Changelog: http://pecl.php.net/package-changelog.php?package=solr Download: http://pecl.php.net/get/solr-0.9.8.tgz Documentation: http://www.php.net/manual/en/book.solr.php Authors - Israel Ekpo ie...@php.net (lead) -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Announcing the Apache Solr extension in PHP - 0.9.0
Hi Mike, Thanks to Pierre, the Windows version of the extension are available here compiled from trunk r 291135 http://downloads.php.net/pierre/ I am planning to have 0.9.8 compiled for windows as soon as it is out sometime later this week. The 1.0 release should be out sometime before mid December after the API is finalized and tested. You can always check the project home page for news about upcoming releases http://pecl.php.net/package/solr The documentation is available here http://www.php.net/manual/en/book.solr.php Cheers On Mon, Nov 23, 2009 at 3:28 PM, Michael Lugassy mlu...@gmail.com wrote: Thanks Israel, exactly what I was looking for, but how would one get a pre-compiled dll for windows? using PHP 5.3 VS9 TS. On Mon, Oct 5, 2009 at 7:03 AM, Israel Ekpo israele...@gmail.com wrote: Fellow Apache Solr users, I have been working on a PHP extension for Apache Solr in C for quite sometime now. I just finished testing it and I have completed the initial user level documentation of the API Version 0.9.0-beta has just been released. It already has built-in readiness for Solr 1.4 If you are using Solr 1.3 or later in PHP, I would appreciate if you could check it out and give me some feedback. It is very easy to install on UNIX systems. I am still working on the build for windows. It should be available for Windows soon. http://solr.israelekpo.com/manual/en/solr.installation.php A quick list of some of the features of the API include : - Built in serialization of Solr Parameter objects. - Reuse of HTTP connections across repeated requests. - Ability to obtain input documents for possible resubmission from query responses. - Simplified interface to access server response data (SolrObject) - Ability to connect to Solr server instances secured behind HTTP Authentication and proxy servers The following components are also supported - Facets - MoreLikeThis - TermsComponent - Stats - Highlighting Solr PECL Extension Homepage http://pecl.php.net/package/solr Some examples are available here http://solr.israelekpo.com/manual/en/solr.examples.php Interim Documentation Page until refresh of official PHP documentation http://solr.israelekpo.com/manual/en/book.solr.php The C source is available here http://svn.php.net/viewvc/pecl/solr/ -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
[ANNOUNCEMENT] solr-0.9.7 (beta) Released
The new PECL package solr-0.9.7 (beta) has been released at http://pecl.php.net/. Release notes - - Fixed bug 16924 AC_MSG_NOTICE() is undefined in autoconf 2.13 - Added new method SolrClient::getDebug() - Modified SolrClient::__construct() so that port numbers and other integer values for the options can be passed as strings. - Changed internal string handling mechanism to allow for tracking of memory allocation in debug mode. - Lowered minimum php version to 5.2.3. Unfortunately, this is the lowest PHP version that will be supported. PHP versions lower than 5.2.3 are not compatible or are causing tests to FAIL. - Added php stubs for code-completion assists in IDEs and editors. - Added more examples Package Info - It effectively simplifies the process of interacting with Apache Solr using PHP5 and it already comes with built-in readiness for the latest features available in Solr 1.4. The extension has features such as built-in, serializable query string builder objects which effectively simplifies the manipulation of name-value pair request parameters across repeated requests. The response from the Solr server is also automatically parsed into native php objects whose properties can be accessed as array keys or object properties without any additional configuration on the client-side. Its advanced HTTP client reuses the same connection across multiple requests and provides built-in support for connecting to Solr servers secured behind HTTP Authentication or HTTP proxy servers. It is also able to connect to SSL-enabled containers. Please consult the documentation for more details on features. Related Links - Package home: http://pecl.php.net/package/solr Changelog: http://pecl.php.net/package-changelog.php?package=solr Download: http://pecl.php.net/get/solr-0.9.7.tgz Authors - Israel Ekpo ie...@php.net (lead)
Re: Solr 1.3 query and index perf tank during optimize
On Tue, Nov 17, 2009 at 2:24 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Basically, search entries are keyed to other documents. We have finite : storage, : so we purge old documents. My understanding was that deleted documents : still : take space until an optimize is done. Therefore, if I don't optimize, the : index : size on disk will grow without bound. : : Am I mistaken? If I don't ever have to optimize, it would make my life : easier. deletions are purged as segments get merged. if you want to force deleted documents to be purged, the only way to do that at the moment is to optimize (which merges all segments). but if you are continually deleteing/adding documents, the deletions will eventaully get purged even if you never optimize. -Hoss Chris, Since the mergeFactor controls the segment merge frequency and size and the number of segments is limited to mergeFactor - 1. Would one be correct to state that if some documents have been deleted from the index and the changes finalized with a call to commit, as more documents are added to the index, eventually the index will be implicitly *optimized * and the deleted documents will be purged even without explicitly issuing an optimize statement? -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: PhP, Solr and Delta Imports
On Mon, Nov 16, 2009 at 2:49 PM, Pablo Ferrari pabs.ferr...@gmail.comwrote: Hello, I have an already working Solr service based un full imports connected via php to a Zend Framework MVC (I connect it directly to the Controller). I use the SolrClient class for php which is great: http://www.php.net/manual/en/class.solrclient.php For now on, every time I want to edit a document I have to do a full import again or I can delete the document by its id and add it again with the updated info... Anyone can guide me a bit in how to do delta imports? If its via php, better! Thanks in advance, Pablo Ferrari Tinkerlabs.net Hello Pablo, You have a couple of options and you do not have to do a full data re-import for the entire index. My example below uses 'doc_id' as the uniqueKey field in your schema. It also assumes that it is an integer type 1. You can remove the document from the index by query or by id (assuming you have its id or uniqueKey field) if you want to just take it out of the active index. $client = new SolrClient($options); $client-deleteById(400); // I recommend this one OR $client-deleteByQuery('doc_id:400'); // This should work too. 2. If all you want to do is to replace/update an existing document in the Solr index and you still want the document to remain active in the index then you can just update it by building a SolrInputDocument object and then submitting just that document using the SolrClient. $client = new SolrClient($options); $doc = new SolrInputDocument(); $doc-addField('doc_id', 334455); $doc-addField('other_field', 'Other Field Value'); $doc-addField('another_field', 'Another Field Value'); $updateResponse = $client-addDocument($doc); If your changes are coming from the db it would be helpful to have a time stamp column that changes each time the record is modified. Then you can keep track of when the last index process was done and the next time you can retrieve only 'active' documents that have been modified or created after this last re-index process. You can send the SolrInputDocuments to the Solr Index using the SolrClient object as shown above for each document. Do not forget to save the changes to the index with a call to SolrClient::commit() If you are updating a lot of records, I would remmend waiting till the end to do the commit (and optimize call if needed). More examples are available here http://us2.php.net/manual/en/solr.examples.php -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Solr - Load Increasing.
On Mon, Nov 16, 2009 at 5:22 PM, Walter Underwood wun...@wunderwood.orgwrote: Probably lakh: 100,000. So, 900k qpd and 3M docs. http://en.wikipedia.org/wiki/Lakh wunder On Nov 16, 2009, at 2:17 PM, Otis Gospodnetic wrote: Hi, Your autoCommit settings are very aggressive. I'm guessing that's what's causing the CPU load. btw. what is laks? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: kalidoss kalidoss.muthuramalin...@sifycorp.com To: solr-user@lucene.apache.org Sent: Mon, November 16, 2009 9:11:21 AM Subject: Solr - Load Increasing. Hi All. My server solr box cpu utilization increasing b/w 60 to 90% and some time solr is getting down and we are restarting it manually. No of documents in solr 30 laks. No of add/update requrest solr 30 thousand / day. Avg of every 30 minutes around 500 writes. No of search request 9laks / day. Size of the data directory: 4gb. My system ram is 8gb. System available space 12gb. processor Family: Pentium Pro Our solr data size can be increase in number like 90 laks. and writes per day will be around 1laks. - Hope its possible by solr. For write commit i have configured like 1 10 Is all above can be possible? 90laks datas and 1laks per day writes and 30laks per day read?? - if yes what type of system configuration would require. Please suggest us. thanks, Kalidoss.m, Get your world in your inbox! Mail, widgets, documents, spreadsheets, organizer and much more with your Sifymail WIYI id! Log on to http://www.sify.com ** DISCLAIMER ** Information contained and transmitted by this E-MAIL is proprietary to Sify Limited and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If this is a forwarded message, the content of this E-MAIL may not have been sent with the authority of the Company. If you are not the intended recipient, an agent of the intended recipient or a person responsible for delivering the information to the named recipient, you are notified that any use, distribution, transmission, printing, copying or dissemination of this information in any way or in any manner is strictly prohibited. If you have received this communication in error, please delete this mail notify us immediately at ad...@sifycorp.com Thanks Walter for clarifying that. I too was wondering what laks meant. It was a bit distracting when I read the original post. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Newbie tips: migrating from mysql fulltext search / PHP integration
On Mon, Nov 16, 2009 at 12:34 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: WOW, +1!! Great job, PHP! Cheers, Chris On 11/15/09 10:13 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I'm not sure if you have a specific question there. But regarding PHP integration part, I just learned PHP now has native Solr (1.3 and 1.4) support: http://twitter.com/otisg/status/5757184282 Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: mbneto mbn...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, November 15, 2009 4:56:15 PM Subject: Newbie tips: migrating from mysql fulltext search / PHP integration Hi, I am looking for alternatives to MySQL fulltext searches. The combo Lucene/Solr is one of my options and I'd like to gather as much information I can before choosing and even build a prototype. My current need does not seem to be different. - fast response time (currently some searches can take more than 11sec) - API to add/update/delete documents to the collection - way to add synonymous or similar words for misspelled ones (ex. Sony = Soni) - way to define relevance of results (ex. If I search for LCD return products that belong to the LCD category, contains LCD in the product definition or ara marked as special offer) I know that I may have to add external code, for example, to take the results and apply some business logic to resort the results but I'd like to know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book (which I am considering to buy) the tips for solr usage. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/http://sunset.usc.edu/%7Emattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ Hi, There is native support for Solr in PHP but currently you have to build it as a PECL extension. It is currently not bundled with the PHP source yet but it is down loadable from the PECL project homepage http://pecl.php.net/package/solr If you currently have pecl support built into your php installation you can install it by running the following command pecl install solr-beta Some usage examples are available here http://us3.php.net/manual/en/solr.examples.php More details are available here http://www.php.net/manual/en/book.solr.php I use Solr with PHP 5.2 - In PHP, the SolrClient class has methods to add, update, delete and rollback changes to the index made since the last commit. - There are also built-in tools in Solr that allow you to analyze and modify the data before indexing it and when searching for it. - with Solr you can define synonyms (check the wiki for more details) - Solr also allows you to sort by score (relevance) - You can specify the fields that you want either as (optional, required or prohibited) My last two points could take care of your last requirement. Solr is awesome and most of the search I perform return sub-second response times. Its several hundred folds easier and more efficient than MySQL fulltext. believe me. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Are subqueries possible in Solr? If so, are they performant?
On Thu, Nov 12, 2009 at 3:39 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I am getting results from one query and I just need 2 index attribute values : . These index attribute values are used for form new Query to Solr. can you elaborate on what exactly you mean by These index attribute values are used for form new Query to Solr ... are you saying that you want to take the values from *every* document matching query#1 and use them to construct query#2 this sounds like you arent' denormalizing your data enough when building your index. : Since Solr gives result only for GET request, hence there is restriction on : : forming query with all values. that's false ... you can post a query if you want, and there are not hard constraints on how big a query can be (just practical constraints on what your physical hardware can handle in a reasonable amount of time) : SELECT id, first_name : FROM student_details : WHERE first_name IN (SELECT first_name : FROM student_details : WHERE subject= 'Science'); : : If so, how performant is this kind of queries? even as a sql query this doesn't relaly make much sense to me (at least not w/o a better understanding of the table+data) why wouldn't you just say: SELECT id, first_name FROM ...WHERE subject='Science' ..or in Solr... q=subject:Sciencefl=id,first_name -Hoss It's also important to note that the Solr schema contains only one table, so to speak; whereas in the traditional database schema you can have more than one table in the same schema where you can do JOINs and sub queries across multiple tables to retrieve the target data. If you are bringing data from multiple database tables into the Solr index, they have to be denormalized to fit into just one table in Solr. So you will have to use a BOOLEAN AND or a filter query to simulate the sub query you are trying to make. I hope this clears things a bit. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Commit error
2009/11/11 Licinio Fernández Maurelo licinio.fernan...@gmail.com Hi folks, i'm getting this error while committing after a dataimport of only 12 docs !!! Exception while solr commit. java.io.IOException: background merge hit exception: _3kta:C2329239 _3ktb:c11-_3ktb into _3ktc [optimize] [mergeDocStores] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2829) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2750) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:138) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:66) at org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:170) at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:208) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:185) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:393) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.io.IOException: No hay espacio libre en el dispositivo at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:499) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:191) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:75) at org.apache.lucene.store.IndexOutput.writeBytes(IndexOutput.java:45) at org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:229) at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:184) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:217) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5089) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4589) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) Index info: 2.600.000 docs | 11G size System info: 15GB free disk space When attempting to commit the disk usage increases until solr breaks ... it looks like 15 GB is not enought space to do the merge | optimize Any advice? -- Lici Hi Licinio, During the the optimization process, the index size would be approximately double what it was originally and the remaining space on disk may not be enough for the task. You are describing exactly what could be going on -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: adding and updating a lot of document to Solr, metadata extraction etc
On Tue, Nov 10, 2009 at 8:26 AM, Eugene Dzhurinsky b...@redwerk.com wrote: On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote: The DIH has improved a great deal from Solr 1.3 to 1.4. You will be much better off using the DIH from this. This is the current Solr release candidate binary: http://people.apache.org/~gsingers/solr/1.4.0/http://people.apache.org/%7Egsingers/solr/1.4.0/ In fact we are prohibited to use release candidates/nightly builds, we are forced to use only releases of Solr :( -- Eugene N Dzhurinsky Well, the official release is out and you can pick it up from your closest mirror here http://www.apache.org/dyn/closer.cgi/lucene/solr/ -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Representing a complex schema in solr
On Sat, Nov 7, 2009 at 11:37 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, i have a complex schema as shown below: Book - Title - Category - Publication - Edition - Publish Date - Author (multivalued) = Author is a multivalued field containing the following attributes. - Name - Age - Location - Gender - Qualification i wanna store the above information in solr so that i can query in every aspect one small query example would be: 1. search for all the books written by females. 2. search for all books writen by young authors...for example between the age 22 to 30. i woudn't wanna use RDBMS coz i have more than one million documents like this. i also tried saving the author as a JSON string. but then i cannot use wild card and range queries on it. any suggessions how wud i represent something like this in solr?? Regards, Raakhi Hi Rakhi, I think you should do this to simplify your storage and retrieval process: Instead of having one multi-valued author field, store each attribute as a separate multi-valued field. So name, age, location, gender and qualification will be separate fields in the schema. This will allow you to query the way you are asking q=gender:female OR by age q=age:[22 TO 30] Use tint (solr.TrieIntField) for the age field (if you are using Solr 1.4) -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: how to use ajax-solr - example?
On Wed, Nov 4, 2009 at 10:48 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I looked at the documentation and I have no idea how to get started? Can someone point me to or show me an example of how to send a query to a solr server and paginate through the results using ajax-solr. I would glady write a blog tutorial on how to do this if someone can get me started. I dont know jquery but have used prototype scriptaculous. thanks Joel Joel, It will be best if you use a scripting language between Solr and JavaScript This is becasue sending data only between JavaScript and Solr will limit you to only one domain name. However, if you are using a scripting language between JavaScript and Solr you can use the scripting language to retrieve the request parameters from JavaScript and then same them to Solr with the response writer set to json. This will cause Solr to send the response in JSON format which the scripting language can pass on to JavaScript. This example here will cause Solr to return the response in JSON. http://example.com:8443/solr/select?q=searchkeywordwt=json -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: How to integrate Solr into my project
2009/11/3 Licinio Fernández Maurelo licinio.fernan...@gmail.com Hi Caroline, i think that you must take an overview tour ;-) , solrj is just a solr java client ... Some clues: - Define your own index schema http://wiki.apache.org/solr/SchemaXml(it's just like a SQL DDL) . - There are different ways to put docs in your index: - SolrJ (Solr client for java env) - DIH http://wiki.apache.org/solr/DataImportHandler (Data Import Handler) this one is prefered when doing a huge data import from DB's, many source formats are supported. - Try to perform queries over your fancy-new index ;-). Learn about searching syntax and facetinghttp://wiki.apache.org/solr/SolrFacetingOverview . 2009/11/3 Caroline Tan caroline@gmail.com Ya, it's a Java projecti just browse this site you suggested... http://wiki.apache.org/solr/Solrj Which means, i declared the dependancy to solr-solrj and solr-core jars, have those jars added to my project lib and by following the Solrj tutorial, i will be able to even index a DB table into Solr as well? thanks ~caroLine 2009/11/3 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com is it a java project ? did you see this page http://wiki.apache.org/solr/Solrj ? On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan caroline@gmail.com wrote: Hi, I wish to intergrate Solr into my current working project. I've played around the Solr example and get it started in my tomcat. But the next step is HOW do i integrate that into my working project? You see, Lucence provides API and tutorial on what class i need to instanstiate in order to index and search. But Solr seems to be pretty vague on this..as it is a working solr search server. Can anybody help me by stating the steps by steps, what classes that i should look into in order to assimiliate Solr into my project? Thanks. regards ~caroLine -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Lici I would also recommend buying the Solr 1.4 Enterprise Search Server. It will give you some tips http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1847195881/ref=sr_1_1?ie=UTF8s=booksqid=1257247932sr=1-1 -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: tracking solr response time
On Mon, Nov 2, 2009 at 8:41 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Nov 2, 2009 at 8:13 AM, bharath venkatesh bharathv6.proj...@gmail.com wrote: We are using solr for many of ur products it is doing quite well . But since no of hits are becoming high we are experiencing latency in certain requests ,about 15% of our requests are suffering a latency How much of a latency compared to normal, and what version of Solr are you using? . We are trying to identify the problem . It may be due to network issue or solr server is taking time to process the request . other than qtime which is returned along with the response is there any other way to track solr servers performance ? how is qtime calculated , is it the total time from when solr server got the request till it gave the response ? QTime is the time spent in generating the in-memory representation for the response before the response writer starts streaming it back in whatever format was requested. The stored fields of returned documents are also loaded at this point (to enable handling of huge response lists w/o storing all in memory). There are normally servlet container logs that can be configured to spit out the real total request time. can we do some extra logging to track solr servers performance . ideally I would want to pass some log id along with the request (query ) to solr server and solr server must log the response time along with that log id . Yep - Solr isn't bothered by params it doesn't know about, so just put logid=xxx and it should also be logged with the other request params. -Yonik http://www.lucidimagination.com If you are not using Java then you may have to track the elapsed time manually. If you are using the SolrJ Java client you may have the following options: There is a method called getElapsedTime() in org.apache.solr.client.solrj.response.SolrResponseBase which is available to all the subclasses I have not used it personally but I think this should return the time spent on the client side for that request. The QTime is not the time on the client side but the time spent internally at the Solr server to process the request. http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/SolrResponseBase.html http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/QueryResponse.html Most likely it could be as a result of an internal network issue between the two servers or the Solr server is competing with other applications for resources. What operating system is the Solr server running on? Is you client application connection to a Solr server on the same network or over the internet? Are there other applications like database servers etc running on the same machine? If so, then the DB server (or any other application) and the Solr server could be competing for resources like CPU, memory etc. If you are using Tomcat, you can take a look in $CATALINA_HOME/logs/catalina.out, there are timestamps there that can also guide you. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: tracking solr response time
On Mon, Nov 2, 2009 at 9:52 AM, bharath venkatesh bharathv6.proj...@gmail.com wrote: Thanks for the quick response @yonik How much of a latency compared to normal, and what version of Solr are you using? latency is usually around 2-4 secs (some times it goes more than that ) which happens to only 15-20% of the request other 80-85% of request are very fast it is in milli secs ( around 200,000 requests happens every day ) @Israel we are not using java client .. we r using python at the client with response formatted in json @yonikn @Israel does qtime measure the total time taken at the solr server ? I am already measuring the time to get the response at client end . I would want a means to know how much time the solr server is taking to respond (process ) once it gets the request . so that I could identify whether it is a solr server issue or internal network issue It is the time spent at the Solr server. I think Yonik already answered this part in his response to your thread : This is what he said : QTime is the time spent in generating the in-memory representation for the response before the response writer starts streaming it back in whatever format was requested. The stored fields of returned documents are also loaded at this point (to enable handling of huge response lists w/o storing all in memory). @Israel we are using rhel server 5 on both client and server .. we have 6 solr sever . one is acting as master . both client and solr sever are on the same network . those servers are dedicated solr server except 2 severs which have DB and memcahce running .. we have adjusted the load accordingly On 11/2/09, Israel Ekpo israele...@gmail.com wrote: On Mon, Nov 2, 2009 at 8:41 AM, Yonik Seeley yo...@lucidimagination.comwrote: On Mon, Nov 2, 2009 at 8:13 AM, bharath venkatesh bharathv6.proj...@gmail.com wrote: We are using solr for many of ur products it is doing quite well . But since no of hits are becoming high we are experiencing latency in certain requests ,about 15% of our requests are suffering a latency How much of a latency compared to normal, and what version of Solr are you using? . We are trying to identify the problem . It may be due to network issue or solr server is taking time to process the request . other than qtime which is returned along with the response is there any other way to track solr servers performance ? how is qtime calculated , is it the total time from when solr server got the request till it gave the response ? QTime is the time spent in generating the in-memory representation for the response before the response writer starts streaming it back in whatever format was requested. The stored fields of returned documents are also loaded at this point (to enable handling of huge response lists w/o storing all in memory). There are normally servlet container logs that can be configured to spit out the real total request time. can we do some extra logging to track solr servers performance . ideally I would want to pass some log id along with the request (query ) to solr server and solr server must log the response time along with that log id . Yep - Solr isn't bothered by params it doesn't know about, so just put logid=xxx and it should also be logged with the other request params. -Yonik http://www.lucidimagination.com If you are not using Java then you may have to track the elapsed time manually. If you are using the SolrJ Java client you may have the following options: There is a method called getElapsedTime() in org.apache.solr.client.solrj.response.SolrResponseBase which is available to all the subclasses I have not used it personally but I think this should return the time spent on the client side for that request. The QTime is not the time on the client side but the time spent internally at the Solr server to process the request. http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/SolrResponseBase.html http://lucene.apache.org/solr//api/solrj/org/apache/solr/client/solrj/response/QueryResponse.html Most likely it could be as a result of an internal network issue between the two servers or the Solr server is competing with other applications for resources. What operating system is the Solr server running on? Is you client application connection to a Solr server on the same network or over the internet? Are there other applications like database servers etc running on the same machine? If so, then the DB server (or any other application) and the Solr server could be competing for resources like CPU, memory etc. If you are using Tomcat, you can take a look in $CATALINA_HOME/logs/catalina.out, there are timestamps there that can also guide you. -- Good Enough is not good enough. To give
Re: adding and updating a lot of document to Solr, metadata extraction etc
On Fri, Oct 30, 2009 at 11:23 AM, Eugene Dzhurinsky b...@redwerk.comwrote: Hi there! We are trying to evaluate Apache Solr for our custom search implementation, which includes the following requirements: - ability to add/update/delete a lot of documents at once - ability to iterate over all documents, returned in search, as Lucene does provide within a HitCollector instance. We would need to extract and aggregate various fields, stored in index, to group results and aggregate them in some way. After reading the tutorial I've realized that adding and removal of documents is performed through passing an XML file to controller in POST request. However our XML files may be very, very large - so I hope there is some another option to avoid interaction through HTTP protocol. Also I did not find any way in the tutorial to access the search results with all fields to be processed by our application. I think I simply did not read the documentation well or missed some point, so can somebody please point me to the articles, which may explain basics of how to achieve my goals? Thank you very much in advance! -- Eugene N Dzhurinsky Hi Eugene Solr has an embedded version but you are encouraged to use the standard web service interfaces. Also, the Solr 1.4 white paper just recently released talks about the the Streaming Updates Solr Server which according to the white paper can index documents at an incredibly lightening speed of up to 25K documents per second. The white paper can be downloaded here http://www.lucidimagination.com/whitepaper/whats-new-in-solr-1-4 Info about Streaming Update Solr Server is available here http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html If you are still interested in the Embedded version to avoid the HTTP version you can check out the following links http://wiki.apache.org/solr/EmbeddedSolr http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html I hope this helps. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Indexing multiple entities
On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola penyask...@gmail.com wrote: Hi, my name is Christian and I'm a newbie introducing to solr (and solrj). I'm working on a website where I want to index multiple entities, like Book or Magazine. The issue I'm facing is both of them have an attribute ID, which I want to use as the uniqueKey on my schema, so I cannot identify uniquely a document (because ID is saved in a database too, and it's autonumeric). I'm sure that this is a common pattern, but I don't find the way of solving it. How do you usually solve this? Thanks in advance. -- Cheers, Christian López Espínola penyaskito Hi Christian, It looks like you are bringing in data to Solr from a database where there are two separate tables. One for *Books* and another one for *Magazines*. If this is the case, you could define your uniqueKey element in Solr schema to be a string instead of an integer then you can still load documents from both the books and magazines database tables but your could prefix the uniqueKey field with B for books and M for magazines Like so : field name=id type=string indexed=true stored=true required=true/ uniqueKeyid/uniqueKey Then when loading the books or magazines into Solr you can create the documents with id fields like this add doc field name=idB14000/field /doc doc field name=idM14000/field /doc doc field name=idB14001/field /doc doc field name=idM14001/field /doc /add I hope this helps -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Version 0.9.3 of the PECL extension for solr has just been released
Version 0.9.3 of the PECL extension for solr has just been released. Some of the methods have been updated and more get* methods have been added to the Query builder classes. The user level documentation was also updated to make the installation instructions a lot clearer. The latest documentation and source code are available from the project home page http://pecl.php.net/package/solr -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Solr 1.4 Release Party
It is my email signature. It is a sort of hybrid/mashup from different sources. On Mon, Oct 12, 2009 at 6:49 PM, Michael Masters mmast...@gmail.com wrote: Where does the quote come from :) On Sat, Oct 10, 2009 at 6:38 AM, Israel Ekpo israele...@gmail.com wrote: I can't wait... -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Solr 1.4 Release Party
I can't wait... -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Announcing the Apache Solr extension in PHP - 0.9.0
Fellow Apache Solr users, I have been working on a PHP extension for Apache Solr in C for quite sometime now. I just finished testing it and I have completed the initial user level documentation of the API Version 0.9.0-beta has just been released. It already has built-in readiness for Solr 1.4 If you are using Solr 1.3 or later in PHP, I would appreciate if you could check it out and give me some feedback. It is very easy to install on UNIX systems. I am still working on the build for windows. It should be available for Windows soon. http://solr.israelekpo.com/manual/en/solr.installation.php A quick list of some of the features of the API include : - Built in serialization of Solr Parameter objects. - Reuse of HTTP connections across repeated requests. - Ability to obtain input documents for possible resubmission from query responses. - Simplified interface to access server response data (SolrObject) - Ability to connect to Solr server instances secured behind HTTP Authentication and proxy servers The following components are also supported - Facets - MoreLikeThis - TermsComponent - Stats - Highlighting Solr PECL Extension Homepage http://pecl.php.net/package/solr Some examples are available here http://solr.israelekpo.com/manual/en/solr.examples.php Interim Documentation Page until refresh of official PHP documentation http://solr.israelekpo.com/manual/en/book.solr.php The C source is available here http://svn.php.net/viewvc/pecl/solr/ -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Quotes in query string cause NullPointerException
Don't be too hard on yourself. Sometimes, mistakes like that can happen even to the most brilliant and most experienced. On Thu, Oct 1, 2009 at 2:15 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Sorry! I'm officially a complete idiot. Personally I'd try to catch things like that and rethrow a 'QueryParseException' or something -- but don't feel under any obligation to listen to me because, well, I'm an idiot. Thanks :-) Andrew. Erik Hatcher-4 wrote: don't forget q=... :) Erik On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote: Hi folks, I'm using the 2009-09-30 build, and any single or double quotes in the query string cause an NPE. Is this normal behaviour? I never tried it with my previous installation. Example: http://myserver:8080/solr/select/?title:%22Creatine+kinase%22 (I've also tried without the URL encoding, no difference) Response: HTTP Status 500 - null java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java: 173) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java: 78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org .apache .solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 235) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 233) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 175) at org .apache .catalina.valves.RequestFilterValve.process(RequestFilterValve.java: 269) at org .apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java: 81) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 568) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 844) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint $Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Single quotes have the same effect. Is there another way to specify exact phrases? Thanks, Andrew. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
XSD for Solr Response Format Version 2.2
I am working on an XSD document for all the types in the response xml version 2.2 Do you think there is a need for this? -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: Limit number of docs that can be indexed (security)
Valdir, I think you are making it more complicated that it needs to be. As the administrator, if you don't want them to modify the contents of the solrconfig.xml file then you should not give them access to do so. If they already have access to change the contents of the file, you can revoke such privileges. That should do it. The users should only work on the client side (adding documents, sending queries) On Mon, Sep 21, 2009 at 6:14 PM, Valdir Salgueiro sombraex...@gmail.comwrote: Hello, I need a way to limit the number of documents that can be indexed on my solr-based application. Here is what I have come up with: create a * UpdateRequestProcessor* and register it on *solrconfig.xml*. When the user tries to add a document, check if the docs limit has been reached. The problem is, the user can modify solrconfig.xml and remove the * UpdateRequestProcessor* so he can index as much as he wants. Any ideas how to implement such restriction in a safer manner? Thanks in advance, Valdir PS: Of course, I also need to make sure the user cannot modify how many files he can index, but I think some encription on the properties file which holds that information will do for now. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: When to use Solr over Lucene
Comparing Solr to Lucene is not exactly an apples-to-apples comparison. Solr is a superset of Lucene. It uses the Lucene engine to index and process requests for data retrieval. Start here first : * http://lucene.apache.org/solr/features.html#Solr+Uses+the+Lucene+Search+Library+and+Extends+it !* It would be unfair to compare to the Apache webserver to a cgi scripting interface. The apache webserver is just the container through with the webrowser interacts with the CGI scripts. This is very similar to how Solr is related to Lucene. On Wed, Sep 16, 2009 at 9:26 AM, balaji.a reachbalaj...@gmail.com wrote: Hi All, I am aware that Solr internally uses Lucene for search and indexing. But it would be helpful if anybody explains about Solr features that is not provided by Lucene. Thanks, Balaji. -- View this message in context: http://www.nabble.com/When-to-use-Solr-over-Lucene-tp25472354p25472354.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.