Still having indexing problems
Hello I have tried indexing the example files using the Jetty method, rather than Tomcat, which still didn't work. I would prefer to use my Tomcat URL. After starting jettty, I issued Java -jar post.jar http://localhost:8983/solr/update solr.xml monitor.xml as in the examples on the tutorial, but post.jar cannot be found... Where is it? Is there a path variable I need to set up somewhere? Any help greatly appreciated. Regards, Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946
RE: Solr concurrent commit not updated
I have keep the id field be unique. Actually I found the problem is due to following Python code: P = subprocess.Popen(arguments, ) It seems that when the program ends, the sub-process started by that call is not finish yet. And I guess that's why staticis shows commit but not adddoc Anyone have similar issue? -Original Message- From: James liu [mailto:[EMAIL PROTECTED] Sent: Friday, May 11, 2007 11:32 AM To: solr-user@lucene.apache.org Subject: Re: Solr concurrent commit not updated u should know id is unique number. 2007/5/11, David Xiao [EMAIL PROTECTED]: Hello all, I have tested by use post.sh in example directory to add xml documents into solr. It works when I add one by one. But when I have a lot of .xml file to be posted (say about 500-1000 files) and I wrote a shell script to call post.sh one by one. I found those xml files are not searchable after post. But from solr admin page / statistics I found that it records commited numbers. But numDocs is not updated. So why, when I use post.sh to post one xml it will be fine, but if I use post.sh for 500 times, each time one xml will be different behavior? Regards, David -- regards jl
Crawler for solr
Hello, I am using crawler to index and search some intranet webpages which need authorization. I wrote my own crawler for this kind of needs. But with the requirement is evolving, I need another crawler for external webpages (on internet) too, so I am looking for a generic crawler that can integrate with Solr. The crawler should be easy to configure and able to customize Xml output according to schema.xml Does anyone had good idea? Regards, David
Re: Requests per second/minute monitor?
On 5/10/07, Ian Holsman [EMAIL PROTECTED] wrote: What I would like to know is (and excuse the newbieness of the question) how to enable solr to log a file with the following data. - time spent (ms) in the request. currently logged - IP# of the incoming request normally in the container access log? - what the request was (and what handler executed it) currently logged - a status code to signal if the request failed for some reasons currently logged - number of rows fetched The number of documents that matched? That's a higher level concept rather specific to a request handler. That info is returned in most responses though. and - the number of rows actually returned That's also in the response, but would be largely meaningless in general. One could also determine this number from the input parameters and the number of docs that matched. A better number might be size of the response (which is normally in the container access log). fields could be very small, or very large, and faceting, highlighting, or other data could dwarf the size/speed due to the main response documents. -Yonik
Re: Crawler for solr
On May 11, 2007, at 7:32 AM, David Xiao wrote: Hello, I am using crawler to index and search some intranet webpages which need authorization. I wrote my own crawler for this kind of needs. But with the requirement is evolving, I need another crawler for external webpages (on internet) too, so I am looking for a generic crawler that can integrate with Solr. The crawler should be easy to configure and able to customize Xml output according to schema.xml Nutch with the SolrIndexer and the solrj client is wonderful for this.
Re: Index Concurrency
On 5/10/07, joestelmach [EMAIL PROTECTED] wrote: Yes, coordination between the main index searcher, the index writer, and the index reader needed to delete other documents. Can you point me to any documentation/code that describes this implementation? Look at SolrCore.getSearcher() and DirectUpdateHandler2. -Yonik
Re: Still having indexing problems
On 5/11/07, Gary Browne [EMAIL PROTECTED] wrote: Hello I have tried indexing the example files using the Jetty method, rather than Tomcat, which still didn't work. I would prefer to use my Tomcat URL. After starting jettty, I issued Java -jar post.jar http://localhost:8983/solr/update solr.xml monitor.xml as in the examples on the tutorial, but post.jar cannot be found... Try using the latest nightly build? If you are using 1.1, just use the post.sh -Yonik
Re: New user - indexing problems
Hey Gary Leave out the URL just use ./post.sh *.xml Your causing curl to attempt to make a get request. P Gary Browne wrote: Hi I'll probably be posting a bunch of stupid questions in the near future, so bear with me. I'm finding the documentation a little confusing. For starters, I've got Solr up and running under Tomcat on port 8080, and I can pull up the admin page, no problems. I'm running on RHEL AS 4, with curl installed. I'm not sure how to get indexing started - I tried the following: ./post.sh http://localhost:8080/solr/update solr.xml monitor.xml (from exampledocs directory) and received this error message:: The specified HTTP method is not allowed for the requested resource (HTTP method GET is not supported by this URL). Any help with this would be much appreciated. Regards Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 -- Patrick O'Leary AOL Syndication Technologies Phone: + 1 703 265 8763 Honesty is the best policy, but insanity is a better defense ! View Patrick O Leary's profile
delete for multiple documents at once
Hi, I'm trying to delete multiple documents at once, but it doesn't work. I am sending this: ?xml version=1.0 encoding=UTF-8? delete id1_3223_po_opc_2/id id1_2454_po_opc_4/id /delete result status=0/resultresult status=1org.xmlpull.v1.XmlPullParserException: expected START_TAG or END_TAG not TEXT (position: TEXT seen ...po_opc_2lt;/idgt;\nlt;idgt;1_2454_po_opc_4lt;/... @4:50) at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1083) at org.apache.solr.core.SolrCore.update(SolrCore.java:832) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:498) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:185) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:715) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:401) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:458) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:790) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:628) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:209) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:358) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:217) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) /result Isn't it possible to do deletes like that? Thanks, Max -- Maximilian Hütter blue elephant systems GmbH Wollgrasweg 49 D-70599 Stuttgart Tel: (+49) 0711 - 45 10 17 578 Fax: (+49) 0711 - 45 10 17 573 e-mail : [EMAIL PROTECTED] Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Re: Alphabetical Facets
I don't have any pointers, but I would love to have this feature. - Original Message From: Ryan McKinley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Friday, May 11, 2007 9:23:02 AM Subject: Alphabetical Facets Has anyone given any thought to alphabetical faceting? I'd like to be able to display facets sorted alphabetically rather then by count or index order. For example, all the subjects for a something of type=a and in collection=b sorted alphabetically. Any pointers before I delve into it? ryan
Re: Alphabetical Facets
: Has anyone given any thought to alphabetical faceting? if by alphabetical you mean the natural unicode ordering of terms for facet.field type facets -- that's already supported. It's the default sort if there is no facet limit (ie: facet.limit=-1) but even with a limit it can be explicitly turned on with facet.sort=false http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1 http://localhost:8983/solr/select/?q=*%3A*facet=truefacet.field=catrows=0facet.limit=5facet.sort=false -Hoss
Re: Still having indexing problems
: Java -jar post.jar http://localhost:8983/solr/update solr.xml : monitor.xml : as in the examples on the tutorial, but post.jar cannot be found... the tutorial on the website is the most current tutorial for the most current development builds ... please refer to the tutorial included with the release of Solr you are using for the most acurate information. -Hoss
Re: can i modifie date format
James, there is actually already an active thread currently discussing the various issues of Solr's date format going on, with a lot of details about the various places formatting might be different, and the issues involved with allowing more configuration, you may want to catchu pp with that thread and reply there... http://www.nabble.com/dates---times-tf3722932.html The short answer is: at the moment no there is no mechanim for customizing the Solr format, but the Format is a very universal one, and i would be extremely suprised if it were not possible to get MySQL to format dates in that way. : MS SQL database have one date format : : solr have one date format : : web page show have one date format : : why not user config date format, solr read date format rule, : : maybe like this, http://cn2.php.net/manual/en/function.date.php : : now solr 1.1 date format is /MM/DD H:I:S? : : : : : : : : : -- : regards : jl : -Hoss
RE: Alphabetical Facets
Would it be difficult to add support for other unicode collations, for i18n purposes? peter -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Friday, May 11, 2007 11:38 AM To: solr-user@lucene.apache.org Subject: Re: Alphabetical Facets Chris Hostetter wrote: : Has anyone given any thought to alphabetical faceting? if by alphabetical you mean the natural unicode ordering of terms for facet.field type facets -- that's already supported. It's the default sort if there is no facet limit (ie: facet.limit=-1) but even with a limit it can be explicitly turned on with facet.sort=false http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b 061e37c702203c99d8853d5f1 http://localhost:8983/solr/select/?q=*%3A*facet=truefacet.field=cat; rows=0facet.limit=5facet.sort=false perfect! I read that, but did not realize natural index order is alphabetical in the ascii range. thanks ryan
RE: Alphabetical Facets
: Would it be difficult to add support for other unicode collations, for : i18n purposes? Difficult? ... probably not, but it would require code. :) The existing natural order sorting on the other hand is there because it was free and easy ... it's the order terms are enumrated in the index. -Hoss
Re: New user - indexing problems
: Leave out the URL : : just use ./post.sh *.xml except that post.sh assumes you are using the example jetty install on port 8983, so you'll need to edit it to use port 8080 -Hoss
Re: delete for multiple documents at once
On 11-May-07, at 9:43 AM, Maximilian Hütter wrote: Hi, I'm trying to delete multiple documents at once, but it doesn't work. I am sending this: ?xml version=1.0 encoding=UTF-8? delete id1_3223_po_opc_2/id id1_2454_po_opc_4/id /delete Isn't it possible to do deletes like that? No it isn't, but you can do multi deletes using delete by query: querydocId:XXX OR docID:YYY OR docId:ZZZ ... -Mike
[acts_as_solr] Release v.0.8 is out
The new release v.0.8 of acts_as_solr is out and includes: NEW - New video tutorial NEW - Faceted search has been implemented and its possible to 'drill-down' on the facets NEW - New rake tasks you can use to start/stop the solr server in test, development and production environments: (thanks Matt Clark) rake solr:start|stop RAILS_ENV=test|development|production (defaults to development if none given) NEW - Changes to the plugin's test framework and it now supports Sqlite as well (thanks Matt Clark) FIX - Patch applied (thanks Micah) that allows one to have multiple solr instances in the same servlet FIX - Patch applied (thanks Micah) that allows indexing of STIs FIX - Patch applied (thanks Gordon) that allows the plugin to use a table's primary key different than 'id' FIX - Returning empty array instead of empty strings when no records are found FIX - Problem with unit tests failing due to order of the tests and speed of the commits == About == This plugin adds full text search capabilities and many other nifty features from Apache's Solr to any Rails model == Installation == On your Rails' root directory, just type script/plugin install http://opensvn.csie.org/acts_as_solr/trunk == Very Basic Usage == Just include the line below to any of your ActiveRecord models: acts_as_solr Or if you want, you can specify only the fields that should be indexed: acts_as_solr :fields = [:name, :author] Then to find instances of your model, just do: Model.find_by_solr(query) or Model.find_id_by_solr(query) Or if you want to specify the starting row and the number of rows per page: Model.find_by_solr(query, :start = 0, :rows = 10) Get it while it's hot = http://acts-as-solr.rubyforge.org -- Thiago Jackiw acts_as_solr = http://acts-as-solr.rubyforge.org Sitealizer = http://sitealizer.rubyforge.org
Re: Solr concurrent commit not updated
On 11-May-07, at 2:45 AM, David Xiao wrote: I have keep the id field be unique. Actually I found the problem is due to following Python code: P = subprocess.Popen(arguments, ) It seems that when the program ends, the sub-process started by that call is not finish yet. And I guess that's why staticis shows commit but not adddoc Anyone have similar issue? When a unix process terminates, its child processes are also terminated (well, it depends on exactly how you created them). Actually, I'm not sure about that on further thought. However, it is best to wait for your processes to complete. After spawning them all, you can use P.wait() to wait for the processes individually, or os.wait() to wait for any of them to complete. Of course, since you are using python anyway, it would be best to open () the xml file and post it yourself (threadedly if you want some concurrency). regards, -Mike
Re: delete for multiple documents at once
On 5/11/07, Mike Klaas [EMAIL PROTECTED] wrote: On 11-May-07, at 9:43 AM, Maximilian Hütter wrote: I'm trying to delete multiple documents at once, but it doesn't work. I am sending this: ?xml version=1.0 encoding=UTF-8? delete id1_3223_po_opc_2/id id1_2454_po_opc_4/id /delete Isn't it possible to do deletes like that? No it isn't, but you can do multi deletes using delete by query: Sounds like it should be added though... -Yonik
Re: Alphabetical Facets
On 5/11/07, Binkley, Peter [EMAIL PROTECTED] wrote: Would it be difficult to add support for other unicode collations, for i18n purposes? It would require collecting *all* of the facet terms/counts, which is potentially very large, and then re-sorting. Definitely much more expensive to do. -Yonik