Embedded SOLR using the SOLR collection distribution
Hello, I would like to know if can implement the Embedded SOLR using the SOLR collection distribution? Regards, Dilip -Original Message- From: mike topper [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 22, 2007 8:29 PM To: solr-user@lucene.apache.org Subject: almost realtime updates with replication Hello, Currently in our application we are using the master/slave setup and have a batch update/commit about every 5 minutes. There are a couple queries that we would like to run almost realtime so I would like to have it so our client sends an update on every new document and then have solr configured to do an autocommit every 5-10 seconds. reading the Wiki, it seems like this isn't possible because of the strain of snapshotting and pulling to the slaves at such a high rate. What I was thinking was for these few queries to just query the master and the rest can query the slave with the not realtime data, although I'm assuming this wouldn't work either because since a snapshot is created on every commit, we would still impact the performance too much? anyone have any suggestions? If I set autowarmingCount=0 would I be able to to pull to the slave faster than every couple of minutes (say, every 10 seconds)? what if I take out the postcommit hook on the master and just have the snapshooter run on a cron every 5 minutes? -Mike
The mechanism of data replciation in Solr?
Hello, everybody:-) I'm interested with the mechanism of data replciation in Solr, In the Introduction to the solr enterprise Search Server, Replication is one of features of Solr, but I can't find anything about replication issues on the Web site and documents, including how to split the index, how to distribute the chunks of index, how to placement the replica, eager replicaton or lazy replication..etc. I think they are different from the problem in HDFS. Can anybody help me? Thank you in advance. Best Wishes.
Re: The mechanism of data replciation in Solr?
On Wed, 2007-09-05 at 15:56 +0800, Dong Wang wrote: Hello, everybody:-) I'm interested with the mechanism of data replciation in Solr, In the Introduction to the solr enterprise Search Server, Replication is one of features of Solr, but I can't find anything about replication issues on the Web site and documents, including how to split the index, how to distribute the chunks of index, how to placement the replica, eager replicaton or lazy replication..etc. I think they are different from the problem in HDFS. Can anybody help me? Thank you in advance. http://wiki.apache.org/solr/CollectionDistribution HTH Best Wishes. -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Embedded SOLR using the SOLR collection distribution
On Sep 5, 2007, at 3:30 AM, Dilip.TS wrote: I would like to know if can implement the Embedded SOLR using the SOLR collection distribution? Partly... the rsync method of getting a master index to the slaves would work, but you'd need a way to commit/ to the slaves so that they reload their IndexSearcher's. Erik Regards, Dilip -Original Message- From: mike topper [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 22, 2007 8:29 PM To: solr-user@lucene.apache.org Subject: almost realtime updates with replication Hello, Currently in our application we are using the master/slave setup and have a batch update/commit about every 5 minutes. There are a couple queries that we would like to run almost realtime so I would like to have it so our client sends an update on every new document and then have solr configured to do an autocommit every 5-10 seconds. reading the Wiki, it seems like this isn't possible because of the strain of snapshotting and pulling to the slaves at such a high rate. What I was thinking was for these few queries to just query the master and the rest can query the slave with the not realtime data, although I'm assuming this wouldn't work either because since a snapshot is created on every commit, we would still impact the performance too much? anyone have any suggestions? If I set autowarmingCount=0 would I be able to to pull to the slave faster than every couple of minutes (say, every 10 seconds)? what if I take out the postcommit hook on the master and just have the snapshooter run on a cron every 5 minutes? -Mike
Tomcat logging
Hi- Here are the lines to add to the end of Tomcat's conf/logging.properties file to get rid of query/update logging noise: org.apache.solr.core.SolrCore.level = WARNING org.apache.solr.handler.XmlUpdateRequestHandler.level = WARNING org.apache.solr.search.SolrIndexSearcher.level = WARNING I would prefer not to get involved in editing the wiki; its generally better to have a few editors. Also, it crosses the line into company property. Also, I'm lazy. Will somebody please add this to the Tomcat page? Thanks, Lance
Re: Distribution Information?
Not that I've noticed. I'll do a more careful grep soon here - I just got back from a long weekend. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Aug 31, 2007, at 6:12 PM, Bill Au wrote: Are there any error message in your appserver log files? Bill On 8/31/07, Matthew Runo [EMAIL PROTECTED] wrote: Hello! /solr/admin/distributiondump.jsp This server is set up as a master server, and other servers use the replication scripts to pull updates from it every few minutes. My distribution information screen is blank.. and I couldn't find any information on fixing this in the wiki. Any chance someone would be able to explain how to get this page working, or what I'm doing wrong? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Indexing a URL
Hello, I am trying to post the following to my index: field name=urlhttp://www.nytimes.com/2007/08/25/business/worldbusiness/25yuan.html?ex=1345694400en=499af384a9ebd18fei=5088partner=rssnytemc=rss /field The url field is defined as: field name=url type=string indexed=false stored=true / However, I get the following error: Posting file docstor/ffc110ee5c9a2ed28c8f35aa243bb53b.xml to http://localhost:8983/news_feed/update html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2preParseError at [row,col]:[3,104] Message: The reference to entity en must end with the ';' delimiter. It is apparently attempting to parse en=499af384a9ebd18f in the URL. I am not clear why it would do this as I specified indexed=false. I need to store this because that is how the user gets to the original article. Is there any data type that simply ignores the characters in the field? I don't care that it can't be a search field. I've tried the ignored field type and it still gives me the same error. Thanks, Bill
Re: Indexing a URL
It is apparently attempting to parse en=499af384a9ebd18f in the URL. I am not clear why it would do this as I specified indexed=false. I need to store this because that is how the user gets to the original article. the ampersand is an XML reserved character. you have to escape it (turn it into amp), whether you are indexing the data or not. Nothing to do w/ Solr, just xml files in general. Whatever you're using to render the xml should be able to handle this for you.
Re: Replication broken.. no helpful errors?
It seems that the scripts cannot open new searchers at the end of the process, for some reason. Here's a message from cron, but I'm not sure what to make of it... It looks like the files properly copied over, but failed the install. I removed the temp* directory, but still SOLR could not launch a new searcher. I don't see any activity in catalina.out though... started by tomcat5 command: /opt/solr/bin/snappuller -M search1 -P 18080 -D /opt/solr/ data -S /opt/solr/logs -d /opt/solr/data -v pulling snapshot temp-snapshot.20070905150504 receiving file list ... done deleting segments_1ine deleting _164h_1.del deleting _164h.tis deleting _164h.tii deleting _164h.prx deleting _164h.nrm deleting _164h.frq deleting _164h.fnm deleting _164h.fdx deleting _164h.fdt deleting _164g_1.del deleting _164g.tis deleting _164g.tii deleting _164g.prx deleting _164g.nrm deleting _164g.frq deleting _164g.fnm deleting _164g.fdx deleting _164g.fdt deleting _164f_1.del deleting _164f.tis deleting _164f.tii deleting _164f.prx deleting _164f.nrm deleting _164f.frq deleting _164f.fnm deleting _164f.fdx deleting _164f.fdt deleting _164e_1.del deleting _164e.tis deleting _164e.tii deleting _164e.prx deleting _164e.nrm deleting _164e.frq deleting _164e.fnm deleting _164e.fdx deleting _164e.fdt deleting _164d_1.del deleting _164d.tis deleting _164d.tii deleting _164d.prx deleting _164d.nrm deleting _164d.frq deleting _164d.fnm deleting _164d.fdx deleting _164d.fdt deleting _164c_1.del deleting _164c.tis deleting _164c.tii deleting _164c.prx deleting _164c.nrm deleting _164c.frq deleting _164c.fnm deleting _164c.fdx deleting _164c.fdt deleting _164b_1.del deleting _164b.tis deleting _164b.tii deleting _164b.prx deleting _164b.nrm deleting _164b.frq deleting _164b.fnm deleting _164b.fdx deleting _164b.fdt deleting _164a_1.del deleting _164a.tis deleting _164a.tii deleting _164a.prx deleting _164a.nrm deleting _164a.frq deleting _164a.fnm deleting _164a.fdx deleting _164a.fdt deleting _163z_3.del deleting _163z.tis deleting _163z.tii deleting _163z.prx deleting _163z.nrm deleting _163z.frq deleting _163z.fnm deleting _163z.fdx deleting _163z.fdt deleting _163o_3.del deleting _163o.tis deleting _163o.tii deleting _163o.prx deleting _163o.nrm deleting _163o.frq deleting _163o.fnm deleting _163o.fdx deleting _163o.fdt deleting _163d_4.del deleting _163d.tis deleting _163d.tii deleting _163d.prx deleting _163d.nrm deleting _163d.frq deleting _163d.fnm deleting _163d.fdx deleting _163d.fdt deleting _1632_6.del deleting _1632.tis deleting _1632.tii deleting _1632.prx deleting _1632.nrm deleting _1632.frq deleting _1632.fnm deleting _1632.fdx deleting _1632.fdt deleting _162r_7.del deleting _162r.tis deleting _162r.tii deleting _162r.prx deleting _162r.nrm deleting _162r.frq deleting _162r.fnm deleting _162r.fdx deleting _162r.fdt deleting _162g_d.del deleting _162g.tis deleting _162g.tii deleting _162g.prx deleting _162g.nrm deleting _162g.frq deleting _162g.fnm deleting _162g.fdx deleting _162g.fdt deleting _1625_m.del deleting _1625.tis deleting _1625.tii deleting _1625.prx deleting _1625.nrm deleting _1625.frq deleting _1625.fnm deleting _1625.fdx deleting _1625.fdt deleting _161u_w.del deleting _161u.tis deleting _161u.tii deleting _161u.prx deleting _161u.nrm deleting _161u.frq deleting _161u.fnm deleting _161u.fdx deleting _161u.fdt deleting _161j_16.del ./ _161j_17.del _164m.fdt _164m.fdx _164m.fnm _164m.frq _164m.nrm _164m.prx _164m.tii _164m.tis _164m_1.del _164x.fdt _164x.fdx _164x.fnm _164x.frq _164x.nrm _164x.prx _164x.tii _164x.tis _164x_1.del segments.gen segments_1inv sent 516 bytes received 105864302 bytes 30247090.86 bytes/sec total size is 966107226 speedup is 9.13 + [[ -z search1 ]] + [[ -z /opt/solr/logs ]] + fixUser -M search1 -S /opt/solr/logs -d /opt/solr/data -V + [[ -z tomcat5 ]] ++ whoami + [[ tomcat5 != tomcat5 ]] ++ who -m ++ cut '-d ' -f1 ++ sed '-es/^.*!//' + oldwhoami= + [[ '' == '' ]] +++ pgrep -g0 snapinstaller ++ tail -1 ++ cut -f1 '-d ' ++ ps h -Hfp 3621 3629 3630 3631 + oldwhoami=tomcat5 + [[ -z /opt/solr/data ]] ++ echo /opt/solr/data ++ cut -c1 + [[ / != \/ ]] ++ echo /opt/solr/logs ++ cut -c1 + [[ / != \/ ]] ++ date +%s + start=1189030205 + logMessage started by tomcat5 ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/09/05 15:10:05 started by tomcat5 + [[ -n '' ]] + logMessage command: /opt/solr/bin/snapinstaller -M search1 -S /opt/ solr/logs -d /opt/solr/data -V ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/09/05 15:10:05 command: /opt/solr/bin/snapinstaller -M search1 -S /opt/solr/logs -d /opt/solr/data -V + [[ -n '' ]] ++ ls /opt/solr/data ++ grep 'snapshot\.' ++ grep -v wip ++ sort -r ++ head -1 + name=temp-snapshot.20070905150504 + trap 'echo caught INT/TERM, exiting now but partial installation may have already occured;/bin/rm -rf ${data_dir/index.tmp$$;logExit aborted 13' INT TERM + [[ temp-snapshot.20070905150504 == '' ]] +
Re: Replication broken.. no helpful errors?
If it helps anyone, this index is around a gig in size. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Sep 5, 2007, at 3:14 PM, Matthew Runo wrote: It seems that the scripts cannot open new searchers at the end of the process, for some reason. Here's a message from cron, but I'm not sure what to make of it... It looks like the files properly copied over, but failed the install. I removed the temp* directory, but still SOLR could not launch a new searcher. I don't see any activity in catalina.out though... started by tomcat5 command: /opt/solr/bin/snappuller -M search1 -P 18080 -D /opt/solr/ data -S /opt/solr/logs -d /opt/solr/data -v pulling snapshot temp-snapshot.20070905150504 receiving file list ... done deleting segments_1ine deleting _164h_1.del deleting _164h.tis deleting _164h.tii deleting _164h.prx deleting _164h.nrm deleting _164h.frq deleting _164h.fnm deleting _164h.fdx deleting _164h.fdt deleting _164g_1.del deleting _164g.tis deleting _164g.tii deleting _164g.prx deleting _164g.nrm deleting _164g.frq deleting _164g.fnm deleting _164g.fdx deleting _164g.fdt deleting _164f_1.del deleting _164f.tis deleting _164f.tii deleting _164f.prx deleting _164f.nrm deleting _164f.frq deleting _164f.fnm deleting _164f.fdx deleting _164f.fdt deleting _164e_1.del deleting _164e.tis deleting _164e.tii deleting _164e.prx deleting _164e.nrm deleting _164e.frq deleting _164e.fnm deleting _164e.fdx deleting _164e.fdt deleting _164d_1.del deleting _164d.tis deleting _164d.tii deleting _164d.prx deleting _164d.nrm deleting _164d.frq deleting _164d.fnm deleting _164d.fdx deleting _164d.fdt deleting _164c_1.del deleting _164c.tis deleting _164c.tii deleting _164c.prx deleting _164c.nrm deleting _164c.frq deleting _164c.fnm deleting _164c.fdx deleting _164c.fdt deleting _164b_1.del deleting _164b.tis deleting _164b.tii deleting _164b.prx deleting _164b.nrm deleting _164b.frq deleting _164b.fnm deleting _164b.fdx deleting _164b.fdt deleting _164a_1.del deleting _164a.tis deleting _164a.tii deleting _164a.prx deleting _164a.nrm deleting _164a.frq deleting _164a.fnm deleting _164a.fdx deleting _164a.fdt deleting _163z_3.del deleting _163z.tis deleting _163z.tii deleting _163z.prx deleting _163z.nrm deleting _163z.frq deleting _163z.fnm deleting _163z.fdx deleting _163z.fdt deleting _163o_3.del deleting _163o.tis deleting _163o.tii deleting _163o.prx deleting _163o.nrm deleting _163o.frq deleting _163o.fnm deleting _163o.fdx deleting _163o.fdt deleting _163d_4.del deleting _163d.tis deleting _163d.tii deleting _163d.prx deleting _163d.nrm deleting _163d.frq deleting _163d.fnm deleting _163d.fdx deleting _163d.fdt deleting _1632_6.del deleting _1632.tis deleting _1632.tii deleting _1632.prx deleting _1632.nrm deleting _1632.frq deleting _1632.fnm deleting _1632.fdx deleting _1632.fdt deleting _162r_7.del deleting _162r.tis deleting _162r.tii deleting _162r.prx deleting _162r.nrm deleting _162r.frq deleting _162r.fnm deleting _162r.fdx deleting _162r.fdt deleting _162g_d.del deleting _162g.tis deleting _162g.tii deleting _162g.prx deleting _162g.nrm deleting _162g.frq deleting _162g.fnm deleting _162g.fdx deleting _162g.fdt deleting _1625_m.del deleting _1625.tis deleting _1625.tii deleting _1625.prx deleting _1625.nrm deleting _1625.frq deleting _1625.fnm deleting _1625.fdx deleting _1625.fdt deleting _161u_w.del deleting _161u.tis deleting _161u.tii deleting _161u.prx deleting _161u.nrm deleting _161u.frq deleting _161u.fnm deleting _161u.fdx deleting _161u.fdt deleting _161j_16.del ./ _161j_17.del _164m.fdt _164m.fdx _164m.fnm _164m.frq _164m.nrm _164m.prx _164m.tii _164m.tis _164m_1.del _164x.fdt _164x.fdx _164x.fnm _164x.frq _164x.nrm _164x.prx _164x.tii _164x.tis _164x_1.del segments.gen segments_1inv sent 516 bytes received 105864302 bytes 30247090.86 bytes/sec total size is 966107226 speedup is 9.13 + [[ -z search1 ]] + [[ -z /opt/solr/logs ]] + fixUser -M search1 -S /opt/solr/logs -d /opt/solr/data -V + [[ -z tomcat5 ]] ++ whoami + [[ tomcat5 != tomcat5 ]] ++ who -m ++ cut '-d ' -f1 ++ sed '-es/^.*!//' + oldwhoami= + [[ '' == '' ]] +++ pgrep -g0 snapinstaller ++ tail -1 ++ cut -f1 '-d ' ++ ps h -Hfp 3621 3629 3630 3631 + oldwhoami=tomcat5 + [[ -z /opt/solr/data ]] ++ echo /opt/solr/data ++ cut -c1 + [[ / != \/ ]] ++ echo /opt/solr/logs ++ cut -c1 + [[ / != \/ ]] ++ date +%s + start=1189030205 + logMessage started by tomcat5 ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/09/05 15:10:05 started by tomcat5 + [[ -n '' ]] + logMessage command: /opt/solr/bin/snapinstaller -M search1 -S / opt/solr/logs -d /opt/solr/data -V ++ timeStamp ++ date '+%Y/%m/%d %H:%M:%S' + echo 2007/09/05 15:10:05 command: /opt/solr/bin/snapinstaller -M search1 -S /opt/solr/logs -d /opt/solr/data -V + [[ -n '' ]] ++ ls /opt/solr/data ++ grep 'snapshot\.' ++ grep -v
Re: Can't get 1.2 running under Tomcat 5.5
: Care needs to be taken when upgrading Solr but leaving solrconfig.xml : untouched because additional config may be necessary. Comparing your : solrconfig.xml with the one that ships with the example app of the version of : Solr you're upgrading too is recommended. Hmmm... that's kind of a scary statement, and it may misslead people into thinking that they need to throw away their configs when updating and start over with the newest examples -- that's certianly not true. I think it's safe to say that if you are using official releases of Solr and not trunk builds, then either: * any old config files will continue to work as is OR: * any known config syntax which no longer works exactly the same way will be called out loudly in the CHANGES.txt files fo the release. If however you are using a nightly snapshot, items that work in your config may not continue to work in future versions as functionality is tweaked and revised. However: Erik's point about comparing your configs with the examples is still a good idea -- because their may be cool new features that you'd like to take advantage of that dont immediately jump out at you when looking at the CHANGES.txt file, but do when looking at sample configs. -Hoss
Re: Can't get 1.2 running under Tomcat 5.5
I guess my warning is more because I play on the edge and have several times ended up tweaking various apps solrconfig.xml's as I upgraded them to keep things working. Anyway, we'll all agree that diff'ing your config files with the example app can be useful. Erik On Sep 5, 2007, at 9:26 PM, Chris Hostetter wrote: : Care needs to be taken when upgrading Solr but leaving solrconfig.xml : untouched because additional config may be necessary. Comparing your : solrconfig.xml with the one that ships with the example app of the version of : Solr you're upgrading too is recommended. Hmmm... that's kind of a scary statement, and it may misslead people into thinking that they need to throw away their configs when updating and start over with the newest examples -- that's certianly not true. I think it's safe to say that if you are using official releases of Solr and not trunk builds, then either: * any old config files will continue to work as is OR: * any known config syntax which no longer works exactly the same way will be called out loudly in the CHANGES.txt files fo the release. If however you are using a nightly snapshot, items that work in your config may not continue to work in future versions as functionality is tweaked and revised. However: Erik's point about comparing your configs with the examples is still a good idea -- because their may be cool new features that you'd like to take advantage of that dont immediately jump out at you when looking at the CHANGES.txt file, but do when looking at sample configs. -Hoss
Re: Can't get 1.2 running under Tomcat 5.5
Not really. It is a very poor substitute for reading the release notes, and sufficiently inadequate that it might not be worth the time. Diffing the example with the previous release is probably more instructive, but might or might not help for your application. A config file checker would be useful. wunder On 9/5/07 6:55 PM, Erik Hatcher [EMAIL PROTECTED] wrote: Anyway, we'll all agree that diff'ing your config files with the example app can be useful.
Re: Indexing very large files.
On Wed, 05 Sep 2007 17:18:09 +0200 Brian Carmalt [EMAIL PROTECTED] wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. Even on an empty index with one Gig of vm memory it sill won't work. Hi Brian, VM != heap memory. VM = OS memory heap memory = memory made available by the JavaVM to the Java process. Heap memory errors are hardly ever an issue of the app itself (other , of course, with bad programming... but it doesnt seem to be issue here so far) [EMAIL PROTECTED] [Thu Sep 6 14:59:21 2007] /usr/home/betom $ java -X [...] -Xmssizeset initial Java heap size -Xmxsizeset maximum Java heap size -Xsssizeset java thread stack size [...] For example, start solr as : java -Xms64m -Xmx512m -jar start.jar YMMV with respect to the actual values you use. Good luck, B _ {Beto|Norberto|Numard} Meijome Windows caters to everyone as though they are idiots. UNIX makes no such assumption. It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.