Facet with large number of unigram entries
Dear List, i have an index with 2.000.000 articles. All those texts get tokenized while indexing. On this data i run a faceted query like this (to receive associated words): select?q=a_spell:{some word}facet.method=enumfacet=truefacet.field=Paragraphfacet.limit=10facet.prefix={some prefix}facet.mincount=1500indent=1fl=_idwt=jsonrows=0' I have more than 5.000.000 unique token in the index and the facet query is quite slow. I also tried differnt FastLRUcache Settings in the FilterCache. Has Anybody a hint how i could improve performance within this setup? Thnak you all -- Andreas Niekler, Dipl. Ing. (FH) NLP Group | Department of Computer Science University of Leipzig Johannisgasse 26 | 04103 Leipzig mail: aniek...@informatik.uni-leipzig.deg.de
Re: Backing up SolR 4.0
On 03/12/12 18:04, Shawn Heisey wrote: Serious production Solr installs require at least two copies of your index. Failures *will* happen, and sometimes they'll be the kind of failures that will take down an entire machine. You can plan for some failures -- redundant power supply and RAID are important for this. Some failures will cause downtime, though -- multiple disk failures, motherboard, CPU, memory, software problems wiping out your index, user error, etc.If you have at least one other copy of your index, you'll be able to keep the system operational while you fix the down machine. Replication is a very good way to accomplish getting two or more copies of your index. I would expect that most production Solr installations use either plain replication or SolrCloud. I do my redundancy a different way that gives me a lot more flexibility, but replication is a VERY solid way to go. If you are running on a UNIX/Linux platform (just about anything *other* than Windows), and backups via replication are not enough for you, you can use the hardlink capability in the OS to avoid taking Solr down while you make backups. Here's the basic sequence: 1) Pause indexing, wait for all commits and merges to complete. 2) Create a target directory on the same filesystem as your Solr index. 3) Make hardlinks of all files in your Solr index in the target directory. 4) Resume indexing. 5) Copy the target directory to your backup location at your leisure. 6) Delete the hardlink copies from the target directory. Making hardlinks is a near-instantaneous operation. The way that Solr/Lucene works will guarantee that your hardlink copy will continue to be a valid index snapshot no matter what happens to the live index. If you can make the backup and get the hardlinks deleted before your index undergoes a merge, the hardlinks will use very little extra disk space. If you leave the hardlink copies around, eventually your live index will diverge to the point where the copy has different files and therefore takes up disk space. If you have a *LOT* of extra disk space on the Solr server, you can keep multiple hardlink copies around as snapshots. Recent versions of Windows do have features similar to UNIX links, so there may in fact be a way to do this on Windows. I will leave that for someone else to pursue. Thanks, Shawn Thanks Shawn, that's very informative. I get twitchy with anything where you can't back it up (memcached excepted). As an administrator, it's my job to recover from failures, and backups are kind of my comfort blanket. I'm running on Linux (on Debian Squeeze) in a fully virtual environment. Initially, I think I'll have to just schedule the backup for the early hours (local time) but as we grow, I can see I'll have to use replication to do it seamlessly. The system is necessarily small right now, as we haven't yet gone live, butwe are anticipating rapid growth, so replication has always been on the cards. Is there an easy way to tell (say from a shell script) when all commits and merges [are] complete? If I keep a replica solely for backup purposes, I assume I can do what I like with it - presumably replication will resume/catch-up when I resume it (I admit, I have a bit of reading to do wrt replication - I just skimmed that because it wasn't in my initial brief). I'm assuming that because you're using hardlinks, that means that SolR writes a new file when it updates (sortof copy-on-write style)? So we are relying on the principle that as long as you have at least one remaining reference to the data, it's not deleted... Thanks once again! -Andy -- Andy D'Arcy Jewell SysMicro Limited Linux Support E: andy.jew...@sysmicro.co.uk W: www.sysmicro.co.uk
Re: Whole Phrase search in Solr
Hello Jack, You are the man! Indeed, this was the problem. We tried several combinations and we thought that we did that too, but somehow we failed to see that your proposal was working! Don't know why, maybe we had something else changed in parallel, don't know. So, THANK YOU, you have been a great support! -- View this message in context: http://lucene.472066.n3.nabble.com/Whole-Phrase-search-in-Solr-tp4023931p4024196.html Sent from the Solr - User mailing list archive at Nabble.com.
replication of files when index is stable/static (SOLR-1304?)
I have a static index with config-files changing frequently. Until now I've distributes these files to all solr-hosts in my current setup manually, but I'm wondering if I can get SOLR to do this using the config-replication. Searching google I've come across https://issues.apache.org/jira/browse/SOLR-1304. Anyone know if there is any work done to this issue, or if there are other work-arounds? Adding (followed by a deletion) of a dummy document seems to trigger the replication, but this is hardly the best solution. Regards, Fredrik Rodland -- Fredrik Rødland Mail:fred...@rodland.no Cell:+47 99 21 98 17 Twitter: @fredrikr Maisen Pedersens vei 1Flickr: http://www.flickr.com/fmmr/ NO-1363 Høvik, NORWAY Web: http://about.me/fmr
SQL DIH - Can I have some guidance please?
Hi. I am having a bit of trouble figuruing out the DIH for SQL files. I have asked around a few different places but havent got any replies so I was hoping you could help me. *I have a database schema like this:* CREATE TABLE company ( id SERIAL PRIMARY KEY, name varchar(60) NOT NULL ); CREATE TABLE country ( id SERIAL PRIMARY KEY, name varchar(255) NOT NULL ); CREATE TABLE location ( id SERIAL PRIMARY KEY, name varchar(255) NOT NULL, coordinate varchar(255) NOT NULL, location_id integer NOT NULL REFERENCES country (id) ); CREATE TABLE source ( id SERIAL PRIMARY KEY, name varchar(60) NOT NULL ); CREATE TABLE item ( id SERIAL PRIMARY KEY, title varchar(60) NOT NULL, description varchar(900) NOT NULL, company_id integer NOT NULL REFERENCES company (id), date timestamp NOT NULL, source_id integer NOT NULL REFERENCES source (id), link varchar(255) NOT NULL, location_id integer NOT NULL REFERENCES location (id) ); *My what I want to put into my schema is this information (named as they are in my schema):* id title description date source link location_name location_coordinates *I made my DIH like this:* dataConfig dataSource name=app-ds driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/wikipedia user=wikipedia password=secret / document entity dataSource=app-ds name=application query=SELECT id, page_title from page field column=id name=id / field column=title name=title / field column=description name=description / field column=date name=date / field column=name name=source / field column=link name=link / field column=name name=location_name / field column=coordinate name=location_coordinates / /entity /document /dataConfig My main questions relate to the entity datastore query and also what to do for field columns when it is a linked table. For example the word name isnt unique since it appears in several different tables. I would really appreciate any help on this, its taken me a while to get to this stage and now I am truely stuck. -- View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to change Solr UI
It's a shame wt=velocity gets a bad rap because /update isn't out of the box strict with the HTTP/RESTful scene. A delete should be a DELETE of some sort. There are 3rd party standalone apps. There was even a standalone ruby app (flare) that was once upon a time in Solr's svn, but really the Solr committers can't be expected to maintain all those various examples and keep them up to date and working, so best to keep them 3rd party IMO. We've got Blacklight, VuFind, and all sorts of other front-ends out there with their own vibrant communities. I'm -1 for removing VW (it's contrib plugin as it is already, just like /update/extract). /browse certainly could use a cleaning up / revamping, but it's good stuff if I do say so myself and very handy to have available for several reasons*. Let's try not to conflate wt=velocity with /update being more easily dangerous than it probably should be. But let's also be clear always that Solr is meant to be behind the firewall as it's primary and default place in the world. Erik * One I'll share: There is a real-world use case of a (relatively big) company using wt=velocity to generate e-mail (for saved searches) texts very conveniently in a backend environment and very high speed, no other technologies/complexities needed in the mix but Solr and a little custom templating. On Dec 3, 2012, at 20:58 , Jack Krupansky wrote: It is annoying to have to repeat these explanations so much. Any serious objection to removing the VW UI from Solr proper and replacing it with a standalone app? I mean, Solr should have PHP, python, Java, and ruby example apps, right? -- Jack Krupansky -Original Message- From: Iwan Hanjoyo Sent: Monday, December 03, 2012 8:28 PM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI Note that Velocity _can_ be used for user-facing code, but be very sure you secure your Solr. If you allow direct access, a user can easily enter something like http:// solr/update?commit=truestream.body=deletequery*:*/query/delete. And all your documents will be gone. Hi Erickson, Thank you for the input. I'll notice and filter out this url. * http:// solr/update?commit=truestream.body=deletequery*:*/query/delete Kind regards, Hanjoyo
Re: SOLR4 cluster - strange CPU spike on slave
Success! I tried adding -XX:+UseConcMarkSweepGC to java to make it GC earlier. We haven't seen any spikes since. I'm cautiously optimistic though and will be monitoring the servers for a week or so before declaring final victory. The post about mmapdirectory is really interesting. We switched to using that from NRTCachingDirectory and am monitoring performance as well. Initially performance doesn't look stellar, but i suspect that we lack memory in the server to really make it shine. Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk On Fri, Nov 30, 2012 at 3:13 PM, Erick Erickson erickerick...@gmail.comwrote: right, so here's what I'd check for. Your logs should show a replication pretty coincident with the spike and that should be in the log. Note: the replication should complete just before the spike. Or you can just turn replication off and fire it manually to try to force the situation at will, see: http://wiki.apache.org/solr/SolrReplication#HTTP_API. (but note that you'll have to wait until the index has changed on the master to see any action). So you should be able to create your spike at will. And this will be pretty normal. When replication happens, a new searcher is opened, caches are filled, autowarming is done, all kinds of stuff like that. During this period, the _old_ searcher is still open, which will both cause the CPU to be busier and require additional memory. Once the new searcher is warmed, new queries go to it, and when the old searcher has finished serving all the queries it shuts down and all the resources are freed. Which is why commits are expensive operations. All of which means that so far I don't think there's a problem, this is just normal Solr operation. If you're seeing responsiveness problems when serving queries you probably want to throw more hardware (particularly memory) at the problem. But when thinking about memory allocating to the JVM, _really_ read Uwe's post here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best Erick On Thu, Nov 29, 2012 at 2:39 AM, John Nielsen j...@mcb.dk wrote: Yup you read it right. We originally intended to do all our indexing to varnish02, replicate to varnish01 and then search from varnish01 (through a fail-over ip which would switch the reader to varnish02 in case of trouble). When I saw the spikes, I tried to eliminate possibilities by starting searching from varnish02, leaving varnish01 with nothing to do but to receive replication data. This did not remove the spikes. As soon as this spike is fixed, I will start searching from varnish01 again. These sort of debug antics are only possible because, although we do have customers using this, we are still in our beta phase. Varnish01 never receives any manual commit orders. Varnish02 does from time to time. Oh, and I accidentally misinformed you before. (damn secondary language) We are actually seeing the spikes on both servers. I was just focusing on varnish01 because I use it to eliminate possibilities. It just occurred to me now; We tried switching off our feeder/index tool for 24 hours, and we didn't see any spikes during this period, so receiving replication data certainly has something to do with it. Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk On Thu, Nov 29, 2012 at 3:20 AM, Erick Erickson erickerick...@gmail.com wrote: Am I reading this right? All you're doing on varnish1 is replicating to it? You're not searching or indexing? I'm sure I'm misreading this. The spike, which only lasts for a couple of minutes, sends the disks racing This _sounds_ suspiciously like segment merging, especially the disks racing bit. Or possibly replication. Neither of which make much sense. But is there any chance that somehow multiple commits are being issued? Of course if varnish1 is a slave, that shouldn't be happening either. And the whole bit about nothing going to the logs is just bizarre. I'm tempted to claim hardware gremlins, especially if you see nothing similar on varnish2. Or some other process is pegging the machine. All of which is a way of saying I have no idea Yours in bewilderment, Erick On Wed, Nov 28, 2012 at 6:15 AM, John Nielsen j...@mcb.dk wrote: I apologize for the late reply. The query load is more or less stable during the spikes. There are always fluctuations, but nothing on the order of magnitude that could explain this spike. In fact, the latest spike occured last night when there were almost noone using it. To test a hunch of mine, I tried to deactivate all caches by commenting
Two databases merge into SOLR - How to keep unique ID?
I have two databases (unfortunately they do have to be separate) which get imported into Solr. Each database has a primary key for each time but I am concerned that when it comes to importing the two into SOLR there will be more than one item with the same ID (one from each DB). Therefore, in order to keep two separate databases with two different uniqueIDs, I was wondering what kind of options I have. Should I append a letter to the primary key of one DB. Is this even possible? -- View this message in context: http://lucene.472066.n3.nabble.com/Two-databases-merge-into-SOLR-How-to-keep-unique-ID-tp4024217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SQL DIH - Can I have some guidance please?
On 04/12/2012, Spadez james_will...@hotmail.com wrote: Hi. I am having a bit of trouble figuruing out the DIH for SQL files. I have asked around a few different places but havent got any replies so I was hoping you could help me. *I have a database schema like this:* CREATE TABLE company ( id SERIAL PRIMARY KEY, name varchar(60) NOT NULL ); CREATE TABLE country ( id SERIAL PRIMARY KEY, name varchar(255) NOT NULL ); CREATE TABLE location ( id SERIAL PRIMARY KEY, name varchar(255) NOT NULL, coordinate varchar(255) NOT NULL, location_id integer NOT NULL REFERENCES country (id) ); CREATE TABLE source ( id SERIAL PRIMARY KEY, name varchar(60) NOT NULL ); CREATE TABLE item ( id SERIAL PRIMARY KEY, title varchar(60) NOT NULL, description varchar(900) NOT NULL, company_id integer NOT NULL REFERENCES company (id), date timestamp NOT NULL, source_id integer NOT NULL REFERENCES source (id), link varchar(255) NOT NULL, location_id integer NOT NULL REFERENCES location (id) ); *My what I want to put into my schema is this information (named as they are in my schema):* id title description date source link location_name location_coordinates It is not entirely clear: (a) How the tables are related. One can guess that item is related to source through source_id, and to location through location_id (b) Which of the Solr fields are to be derived from which table. In particular, what are the tables, company and country, used for. *I made my DIH like this:* The way to deal with related tables is by using nested entities, e.g., some of your fields can be populated by dataConfig dataSource name=app-ds driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/wikipedia user=wikipedia password=secret / document entity dataSource=app-ds name=item query=SELECT id, title, location_id from item entity dataSource=app-ds name=location query=SELECT name, coordinate from location where location_id=${item.location_id} field column=id name=id / field column=title name=title / field column=name name=location_name / field column=coordinate name=location_coordinates / /entity /entity /document /dataConfig You can add more second-level entities, and/or fields as needed. The ${item.location_id} refers to the location_id in the select from the top- level entity. My main questions relate to the entity datastore query and also what to do for field columns when it is a linked table. For example the word name isnt unique since it appears in several different tables. You could change the name in the select, e.g., for the top-level entity have: query=select name as top_level_name and for the inner entity have: query=select name as second_level_name Regards, Gora
Re: Two databases merge into SOLR - How to keep unique ID?
On 04/12/2012, Spadez james_will...@hotmail.com wrote: I have two databases (unfortunately they do have to be separate) which get imported into Solr. Each database has a primary key for each time but I am concerned that when it comes to importing the two into SOLR there will be more than one item with the same ID (one from each DB). Therefore, in order to keep two separate databases with two different uniqueIDs, I was wondering what kind of options I have. Should I append a letter to the primary key of one DB. Is this even possible? Yes, that sounds like a good plan. You can add this unique string in the select, or use a simple script transformer for that. Regards, Gora
Sorting by multi-valued field
Hey all! In our system users can create recurring events and search for events starting on or after a given date. Searching and filtering of events works perfectly, but users expect the result set to be ordered by the next start time. For each event, we index a multi-valued date field containing all its start times. The relevant parts of my schema look like this: - event_id - start_times_dts In SQL I would do something like: WHERE start_times = %SELECTED_DATE% GROUP BY event_id HAVING min(start_times) ORDER BY start_times ASC The only thing I could think of so far is to index every single start time of an event as a separate document and group on the event. This would solve the sorting problem but would drastically increase our index size. Is there a more elegant way to do this in Solr? A function query or subquery maybe? I thought about it for quite a while and couldn't come up with a viable solution. Cheers, Thomas
Cannot run Solr4 from Intellij Idea
After 2 days I have figured out how to open Solr 4 in IntelliJ IDEA 11.1.4 on Tomcat 7. IntelliJ IDEA finds webapp/web/WEB-INF/web.xml and offers to make a facet from it and adds this facet to the parent module, from which an artifact can be created. The problem is that Solr cannot run properly. I get this message: SEVERE: Unable to create core: mycore org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.StandardTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4650) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5306) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.StandardTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:344) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 25 more Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.StandardTokenizerFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:436) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:457) at org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 29 more Caused by: java.lang.ClassNotFoundException: solr.StandardTokenizerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:420) ... 32 more I tried to debug it and found it cannot resolve 'solr.StandardTokenizerFactory' because it searches this class inside solr, when it is inside lucene. I can update my schema.xml and replace all solr shorname with full class names, but I don't think it is correct, because Solr runs properly with this schema on Tomcat 7 if run not from Intellij
Re: SQL DIH - Can I have some guidance please?
Thank you so much for your help. Based on the same schema in my first post and your help I created this, have I implemented it correctly based on your suggestion? I tried to comment it: dataConfig dataSource name=app-ds driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/wikipedia user=wikipedia password=secret / document entity dataSource=app-ds name=item query=SELECT id, title, description, date, link, location_id, source_id, company_id from item entity dataSource=app-ds name=location query=SELECT name, coordinate from location where location_id=${item.location_id} entity dataSource=app-ds name=source query=SELECT name from source where source_id=${item.source_id} entity dataSource=app-ds name=company query=SELECT name from company where company_id=${item.company_id} field column=id name=id / field column=title name=title / field column=description name=description / field column=date name=date / field column=link name=link / field column=name name=company_name / field column=name name=source_name / field column=name name=location_name / field column=coordinate name=location_coordinates / /entity /entity /document /dataConfig -- View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024235.html Sent from the Solr - User mailing list archive at Nabble.com.
How to SWAP cores (or collections) with SolrCloud (SOLR-3866)
Hello, With solr-4.0.0, the useful SWAP command http://wiki.apache.org/solr/CoreAdmin#SWAP that allows to have a main core serving searches, while a temp core can be re-indexed from scratch, no longer works on SolrCloud, as was discussed here :Solr Swap Function doesn't work when using Solr Cloud Beta https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201209.mbox/%3ccalb4qrmi_fjpes8onyoeusbk1e51sqscxg6aaeulmvdl0f2...@mail.gmail.com%3E As we really need the ability to rebuild a complete index behind the scenes without interrupting the search service, we tried some workarounds, without success. Typical setup is : solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores defaultCoreName=core_main adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=8983 hostContext=solr core shard=shard1 instanceDir=instance_main name=core_main collection=collection_main dataDir=data/ core shard=shard1 instanceDir=instance_temp name=core_temp collection=collection_temp dataDir=data/ /cores /solr solrconfig.xml etc are all the same for all cores/collections. We start 1 leader and 1 replica. Excerpt of tests/workarounds that were tried (please ask if details are needed) : 1.a switch instanceDir path in solr.xml, set /clusterstate.json {} (to reset it), restart leader 1.bswitch instanceDir path and core names in solr.xml, set /clusterstate.json {}, restart leader - solr leader and replica don't respond anymore when clusterstate.json is empty - leader: KeeperErrorCode = NoNode for /collections/collection_temp/leaders/shard1 - replica: Could not find collection in zk: collection_main 2.a switch instanceDir path in solr.xml, RELOAD cores / collections 2.b switch instanceDir path and core names in solr.xml, RELOAD cores / collections - No visible effect 3. RENAME core_temp to core_main - query on collection_main returns data initially indexed to core_temp (what we want),but collection_main is no more in solr.xml and a 404 error is returned for any further document updates to collection_main 4. switch instanceDir directories on the filesystem, using a sequence of dir moves, RELOAD cores - 500 error is returned for the replica cores RELOAD, with SEVERE error in log, seems like solr did not like our messing with lucene files. (Surprisingly, if there is no replica, this seems to work) We also tried to use SYNCSHARD collection command and try to manually set clusterstate.json with a new configuration but these workarounds don't work. Now we are running short of ideas to progress further... It seems that we really need to patch solr to properly implement SWAP for cores (or collections by the way) in the context of SolrCloud. So, * what can we do to help progress on SOLR-3866 ? Maybe use case scenarios, detailing desired behavior ? Constrains on what cores or collections are allowed to SWAP, ie. same config, same doc-shard assignments ? * our naive idea would be to update the /clusterstate.json config on Zk, and propagate this configuration change to all solr instances. Is that even remotely realistic ? -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: How to change Solr UI
let's also be clear always that Solr is meant to be behind the firewall Absolutely, but we are NOT doing that when we provide the Velocity-based /browse UI. Erik, your email example sounds reasonable, so if you want to substitute something like that for the /browse handler, fine. As you point out, it is not Velocity per se, but the /browse UI that results in a lack of clarity about Solr being meant to be behind the firewall. -- Jack Krupansky -Original Message- From: Erik Hatcher Sent: Tuesday, December 04, 2012 5:23 AM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI It's a shame wt=velocity gets a bad rap because /update isn't out of the box strict with the HTTP/RESTful scene. A delete should be a DELETE of some sort. There are 3rd party standalone apps. There was even a standalone ruby app (flare) that was once upon a time in Solr's svn, but really the Solr committers can't be expected to maintain all those various examples and keep them up to date and working, so best to keep them 3rd party IMO. We've got Blacklight, VuFind, and all sorts of other front-ends out there with their own vibrant communities. I'm -1 for removing VW (it's contrib plugin as it is already, just like /update/extract). /browse certainly could use a cleaning up / revamping, but it's good stuff if I do say so myself and very handy to have available for several reasons*. Let's try not to conflate wt=velocity with /update being more easily dangerous than it probably should be. But let's also be clear always that Solr is meant to be behind the firewall as it's primary and default place in the world. Erik * One I'll share: There is a real-world use case of a (relatively big) company using wt=velocity to generate e-mail (for saved searches) texts very conveniently in a backend environment and very high speed, no other technologies/complexities needed in the mix but Solr and a little custom templating. On Dec 3, 2012, at 20:58 , Jack Krupansky wrote: It is annoying to have to repeat these explanations so much. Any serious objection to removing the VW UI from Solr proper and replacing it with a standalone app? I mean, Solr should have PHP, python, Java, and ruby example apps, right? -- Jack Krupansky -Original Message- From: Iwan Hanjoyo Sent: Monday, December 03, 2012 8:28 PM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI Note that Velocity _can_ be used for user-facing code, but be very sure you secure your Solr. If you allow direct access, a user can easily enter something like http:// solr/update?commit=truestream.body=deletequery*:*/query/delete. And all your documents will be gone. Hi Erickson, Thank you for the input. I'll notice and filter out this url. * http:// solr/update?commit=truestream.body=deletequery*:*/query/delete Kind regards, Hanjoyo
Re: SQL DIH - Can I have some guidance please?
On 04/12/2012, Spadez james_will...@hotmail.com wrote: Thank you so much for your help. Based on the same schema in my first post and your help I created this, have I implemented it correctly based on your suggestion? I tried to comment it: Looks almost correct. You only need two levels of nesting, and can use proper nesting to differentiate between the different name attributes. dataConfig dataSource name=app-ds driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/wikipedia user=wikipedia password=secret / document entity dataSource=app-ds name=item query=SELECT id, title, description, date, link, location_id, source_id, company_id from item field column=id name=id / field column=title name=title / field column=description name=description / field column=date name=date / field column=link name=link / entity dataSource=app-ds name=location query=SELECT name, coordinate from location where location_id=${item.location_id} field column=name name=location_name / field column=coordinate name=location_coordinates / /entity entity dataSource=app-ds name=source query=SELECT name from source where source_id=${item.source_id} field column=name name=source_name / /entity entity dataSource=app-ds name=company query=SELECT name from company where company_id=${item.company_id} field column=name name=company_name / /entity /entity /document /dataConfig -- View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024235.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SQL DIH - Can I have some guidance please?
Thank you so much for the help, I really appreciate it. -- View this message in context: http://lucene.472066.n3.nabble.com/SQL-DIH-Can-I-have-some-guidance-please-tp4024207p4024250.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to change Solr UI
On Dec 4, 2012, at 08:21 , Jack Krupansky wrote: let's also be clear always that Solr is meant to be behind the firewall Absolutely, but we are NOT doing that when we provide the Velocity-based /browse UI. Erik, your email example sounds reasonable, so if you want to substitute something like that for the /browse handler, fine. As you point out, it is not Velocity per se, but the /browse UI that results in a lack of clarity about Solr being meant to be behind the firewall. Point taken about being clear about this. But I disagree about removing /browse. It's useful, even if misunderstood/abused by some. If there are spots where we need to be clearer about what it is that is being rendered, how it's rendered, and the pros/cons to it, then let's see about getting it mentioned more clearly. But do keep in mind that something like this example: having Solr return suggestion lists as plain text suitable for suggest interfaces rather than having it return JSON or XML and having a middle tier process it when all you need is a plain list or some CSV. It's quite fine and sensible to use wt=velocity behind the firewall too, even /browse as-is. Same as with the XSL transformation writing capability. Erik
Re: How to change Solr UI
I have been mulling on this. The browse UI is getting a little out of date, and has interesting 'features' such as only showing a map for a document if the document has a 'name' field, which makes no real sense at all. Apart from renovating the UI of browse, or possibly replacing it with something more modern based upon the new admin UI technology, it would make sense to add a 'disclaimer' somewhere prominent on the browse interface - title it 'Solr Demo' or 'Solr Prototype', and add a link to a wiki page that explains *why* you shouldn't use this in production. Apart from the security issues already mentioned there's the MVC side - you have a model and a view, but no controller, thus it becomes hard to build anything useful very quickly. I'd happily hack disclaimers into place if considered useful. Upayavira On Tue, Dec 4, 2012, at 01:21 PM, Jack Krupansky wrote: let's also be clear always that Solr is meant to be behind the firewall Absolutely, but we are NOT doing that when we provide the Velocity-based /browse UI. Erik, your email example sounds reasonable, so if you want to substitute something like that for the /browse handler, fine. As you point out, it is not Velocity per se, but the /browse UI that results in a lack of clarity about Solr being meant to be behind the firewall. -- Jack Krupansky -Original Message- From: Erik Hatcher Sent: Tuesday, December 04, 2012 5:23 AM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI It's a shame wt=velocity gets a bad rap because /update isn't out of the box strict with the HTTP/RESTful scene. A delete should be a DELETE of some sort. There are 3rd party standalone apps. There was even a standalone ruby app (flare) that was once upon a time in Solr's svn, but really the Solr committers can't be expected to maintain all those various examples and keep them up to date and working, so best to keep them 3rd party IMO. We've got Blacklight, VuFind, and all sorts of other front-ends out there with their own vibrant communities. I'm -1 for removing VW (it's contrib plugin as it is already, just like /update/extract). /browse certainly could use a cleaning up / revamping, but it's good stuff if I do say so myself and very handy to have available for several reasons*. Let's try not to conflate wt=velocity with /update being more easily dangerous than it probably should be. But let's also be clear always that Solr is meant to be behind the firewall as it's primary and default place in the world. Erik * One I'll share: There is a real-world use case of a (relatively big) company using wt=velocity to generate e-mail (for saved searches) texts very conveniently in a backend environment and very high speed, no other technologies/complexities needed in the mix but Solr and a little custom templating. On Dec 3, 2012, at 20:58 , Jack Krupansky wrote: It is annoying to have to repeat these explanations so much. Any serious objection to removing the VW UI from Solr proper and replacing it with a standalone app? I mean, Solr should have PHP, python, Java, and ruby example apps, right? -- Jack Krupansky -Original Message- From: Iwan Hanjoyo Sent: Monday, December 03, 2012 8:28 PM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI Note that Velocity _can_ be used for user-facing code, but be very sure you secure your Solr. If you allow direct access, a user can easily enter something like http:// solr/update?commit=truestream.body=deletequery*:*/query/delete. And all your documents will be gone. Hi Erickson, Thank you for the input. I'll notice and filter out this url. * http:// solr/update?commit=truestream.body=deletequery*:*/query/delete Kind regards, Hanjoyo
Re: Range Queries performing differently on SortableIntField vs TrieField of type integer
One small question - did you re-index in-between? The index structure will be different for each. Upayavira On Tue, Dec 4, 2012, at 02:30 PM, Aaron Daubman wrote: Greetings, I'm finally updating an old instance and in testing, discovered that using the recommended TrieField instead of SortableIntField for range queries returns unexpected and seemingly incorrect results. A query with: q=*:*fq=+i_yearStartSort:{* TO 1995}fq=+i_yearStopSort:{* TO *} Should, and does under 1.4.1 with SortableIntField, only return docs that have some i_yearStopSort value and have an i_yearStartSort value less than 1995. Unfortunately, under 3.6.1 with class=solr.TrieField type=integer, this query is returning docs that have neither an i_yearStopSort nor a i_yearStartSort value. Here are the two schemas: Solr 1.4.1 Relevant Schema Parts - Working as desired: - fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ ... field name=i_yearStartSort type=sint indexed=true stored=false required=false multiValued=true/ field name=i_yearStopSort type=sint indexed=true stored=false required=false multiValued=true/ Solr 3.6.1 Relevant Schema Parts - Not working as expected: - fieldType name=tint class=solr.TrieField type=integer precisionStep=4 sortMissingLast=true positionIncrementGap=0 omitNorms=true/ ... field name=i_yearStartSort type=tint indexed=true stored=false required=false multiValued=false/ field name=i_yearStopSort type=tint indexed=true stored=false required=false multiValued=false/ 1) What is the best way to return to the desired/expected behavior? 2) Can you explain to me why this happens? 3) I have a sneaking suspicion (but could be totally wrong) that this relates to sortMissingLast=true - if it does, can you explain the seeming discrepancies in: SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix version of 3.5 listed, but some of the comments also seem to indicate this was not actually fixed in 3.5+ Thanks, Aaron
Re: How to change Solr UI
Or, maybe integrate /browse with the Solr Admin UI and give it a graphic treatment that screams that it is a development tool and not designed to be a model for an app UI. And, I still think it would be good to include SOME example of a prototype app UI with Solr, to drive home the point of here is [an example of] how you need to separate UI from Solr. -- Jack Krupansky -Original Message- From: Erik Hatcher Sent: Tuesday, December 04, 2012 9:29 AM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI On Dec 4, 2012, at 08:21 , Jack Krupansky wrote: let's also be clear always that Solr is meant to be behind the firewall Absolutely, but we are NOT doing that when we provide the Velocity-based /browse UI. Erik, your email example sounds reasonable, so if you want to substitute something like that for the /browse handler, fine. As you point out, it is not Velocity per se, but the /browse UI that results in a lack of clarity about Solr being meant to be behind the firewall. Point taken about being clear about this. But I disagree about removing /browse. It's useful, even if misunderstood/abused by some. If there are spots where we need to be clearer about what it is that is being rendered, how it's rendered, and the pros/cons to it, then let's see about getting it mentioned more clearly. But do keep in mind that something like this example: having Solr return suggestion lists as plain text suitable for suggest interfaces rather than having it return JSON or XML and having a middle tier process it when all you need is a plain list or some CSV. It's quite fine and sensible to use wt=velocity behind the firewall too, even /browse as-is. Same as with the XSL transformation writing capability. Erik=
Re: Problems while Searching Plural Form of Verb
Use a stemmer, such as the English Plural-only stemmer,EnglishMinimalStemFilterFactory. See: http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemFilterFactory.html and http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemmer.html -- Jack Krupansky -Original Message- From: Pratyul Kapoor Sent: Tuesday, December 04, 2012 9:25 AM To: solr-user@lucene.apache.org Subject: Problems while Searching Plural Form of Verb Hello, I am facing a problem with plural form of words. My Index has words in its singular form. And If I search for plural form of word, I get zero results. Is there any way in solr by which I can fix this. Pratyul
[Solrj 4.0] How use JOIN
Hi, I can't found any good example how use Join function with Solrj 4.0 api. Let's have this example data: doc field name=id1/field field name=nameThomas/field field name=age40/field /doc doc field name=id2/field field name=nameJohn/field field name=age17/field field name=parent1/field /doc And code: String stringQuery = (name:Thomas) AND (age:40); SolrQuery query = new SolrQuery(); query.setQuery(stringQuery); QueryResponse response = solrServer.query(query); Result is doc with id=1. Now I want extend query above with JOIN similar to this: {!join from=to=idparent }(name:John AND age:17) and expect same result - doc with id=1. But I have no idea how add this Join condition to SolrQuery object. Is there any method I missed? Or must I extend my base stringQuery somehow? Thanks for any advice! Roman
Re: Range Queries performing differently on SortableIntField vs TrieField of type integer
Hi Upayavira, One small question - did you re-index in-between? The index structure will be different for each. Yes, the Solr 1.4.1 (working) instance was built using the original schema and that solr version. The Solr 3.6.1 (not working) instance was re-built using the new schema and Solr 3.6.1... Thanks, Aaron
Re: Range Queries performing differently on SortableIntField vs TrieField of type integer
I forgot a possibly important piece... Given the different Solr versions, the schema version (and it's related different defaults) is also a change: Solr 1.4.1 Has: schema name=ourSchema version=1.1 Solr 3.6.1 Has: schema name=ourSchema version=1.5 Solr 1.4.1 Relevant Schema Parts - Working as desired: - fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ ... field name=i_yearStartSort type=sint indexed=true stored=false required=false multiValued=true/ field name=i_yearStopSort type=sint indexed=true stored=false required=false multiValued=true/ Solr 3.6.1 Relevant Schema Parts - Not working as expected: - fieldType name=tint class=solr.TrieField type=integer precisionStep=4 sortMissingLast=true positionIncrementGap=0 omitNorms=true/ ... field name=i_yearStartSort type=tint indexed=true stored=false required=false multiValued=false/ field name=i_yearStopSort type=tint indexed=true stored=false required=false multiValued=false/
Re: Cannot run Solr4 from Intellij Idea
Interestingly, I have run in to this same (or very similar) issue when attempting to run embedded solr. All of the solr.* classes that were recently moved to lucene would not work with the solr.* shorthand - I had to replace them with the full classpath. As you found, these shorthands in the same schema worked fine from within solr proper (webapp). Is there a workaround for this? (It would be great to have a unified schema between embedded and webapp solr instances) Thanks, Aaron On Tue, Dec 4, 2012 at 7:37 AM, Artyom ice...@mail.ru wrote: After 2 days I have figured out how to open Solr 4 in IntelliJ IDEA 11.1.4 on Tomcat 7. IntelliJ IDEA finds webapp/web/WEB-INF/web.xml and offers to make a facet from it and adds this facet to the parent module, from which an artifact can be created. The problem is that Solr cannot run properly. I get this message: SEVERE: Unable to create core: mycore org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.StandardTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:846) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:534) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4650) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5306) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.StandardTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:344) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 25 more Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.StandardTokenizerFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:436) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:457) at org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 29 more Caused by: java.lang.ClassNotFoundException: solr.StandardTokenizerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at
Re: Luke and SOLR search giving different results
Thanks Shawn and Jack, I changed solrconfig to set defaul query field (qf) to field content. It works fine now. Erol Akarsu On Mon, Dec 3, 2012 at 5:03 PM, Shawn Heisey s...@elyograg.org wrote: On 12/3/2012 1:44 PM, Erol Akarsu wrote: I tried as search query not baş but features:baş in field q in SOLR GUI. And, I got result! In the one document, I had some fields type of text_eng, text_general and one field features type of text_tr. If I don't specify field name, SOLR use EnglishAnalyzer. If I do, it uses the analyzer specific to field specified in search query string. Your config is set up to search against a field named text by default - either by a setting in schema.xml or a df parameter in your search handler definition in solrconfig.xml. If you are using (e)dismax, it might be qf/pf parameters instead of df. The field named text is not properly set up for this search. Your attachment at the beginning of this thread indicates that either you do not have a text field for this document at all, or that field is not stored. If the text field is a copyField as Jack has mentioned, note that it doesn't matter what analysis you are doing on features -- the copy is done before analysis, so it is completely separate. Thanks, Shawn
Re: Backing up SolR 4.0
On 12/4/2012 1:55 AM, Andy D'Arcy Jewell wrote: Is there an easy way to tell (say from a shell script) when all commits and merges [are] complete? One important bit of information I just thought of: A default Solr 4 config uses a new directory implementation called NRTCachingDirectory, which in some circumstances may keep part of the newest segment(s) in RAM. I *hope* that issuing an explicit hard commit will flush that to disk, but I am not sure. You *might* need to switch to the old directory implementation to be sure a hardlink backup is complete. Can one of the committers please comment on this? Assuming that we work out this detail, the rest of what I've said will be valid. Detecting when commits are done has to be coordinated with your indexing program, so depending on how your system works, kicking off the make hardlinks process might need to be part of your indexing program. As for merges, that's a bit tougher, because Solr 4 and later will do merges in the background after informing your indexing program that the commit is complete. If you grab a hardlink copy while a merge is happening, I do not believe it will be corrupt in any way, but it may be larger than expected because it will contain the new segments from the merge. Those segments would not be referenced by the segments.nnn file, so I *think* that if you then load that index into Solr, it would ignore the other segments. I am not sure about that, though. You might be able to use a command like the following to delete the newer segments from the copy, but I would not do it without experimentation to be sure it's actually required, and that it never wipes anything out that you actually need: find -type f -newer segments.gen | xargs rm -f If I keep a replica solely for backup purposes, I assume I can do what I like with it - presumably replication will resume/catch-up when I resume it (I admit, I have a bit of reading to do wrt replication - I just skimmed that because it wasn't in my initial brief). As long as the replica server isn't being actively updated or used for queries and you temporarily turn off replication, you should be able to do whatever you want with its index. I'm assuming that because you're using hardlinks, that means that SolR writes a new file when it updates (sortof copy-on-write style)? So we are relying on the principle that as long as you have at least one remaining reference to the data, it's not deleted... Yes. Lucene (which Solr uses under the hood) never touches segment files once they have been written. It only deletes segment files in two circumstances: 1) Every document in that segment has been deleted from the index. 2) The data in that segment has been written to a new segment. The combination of Lucene's update method and hardlink functionality will ensure that the hardlink copy is always good. Thanks, Shawn
Re: How to change Solr UI
That's an interesting take. I agree that Solr needs *something* for folks to use. It is unfortunate that Solr actually has a functioning HTTP infrastructure, because it then makes less sense to build an alternative one up. E.g. How about: http://localhost:8983/solr - admin UI http://localhost:8983/browse - separate browse webapp It would be a separate app that runs as another wepapp, accessing Solr via HTTP just as any other app would. It could still use Velocity, but would demonstrate that you shouldn't integrate your app with Solr. A minimal dependency app for demonstration purposes only. Upayavira On Tue, Dec 4, 2012, at 02:37 PM, Jack Krupansky wrote: Or, maybe integrate /browse with the Solr Admin UI and give it a graphic treatment that screams that it is a development tool and not designed to be a model for an app UI. And, I still think it would be good to include SOME example of a prototype app UI with Solr, to drive home the point of here is [an example of] how you need to separate UI from Solr. -- Jack Krupansky -Original Message- From: Erik Hatcher Sent: Tuesday, December 04, 2012 9:29 AM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI On Dec 4, 2012, at 08:21 , Jack Krupansky wrote: let's also be clear always that Solr is meant to be behind the firewall Absolutely, but we are NOT doing that when we provide the Velocity-based /browse UI. Erik, your email example sounds reasonable, so if you want to substitute something like that for the /browse handler, fine. As you point out, it is not Velocity per se, but the /browse UI that results in a lack of clarity about Solr being meant to be behind the firewall. Point taken about being clear about this. But I disagree about removing /browse. It's useful, even if misunderstood/abused by some. If there are spots where we need to be clearer about what it is that is being rendered, how it's rendered, and the pros/cons to it, then let's see about getting it mentioned more clearly. But do keep in mind that something like this example: having Solr return suggestion lists as plain text suitable for suggest interfaces rather than having it return JSON or XML and having a middle tier process it when all you need is a plain list or some CSV. It's quite fine and sensible to use wt=velocity behind the firewall too, even /browse as-is. Same as with the XSL transformation writing capability. Erik=
Re: How to change Solr UI
And basically that's what i had in mind with Prism here: https://github.com/lucidimagination/Prism Prism's very lightweight, uses Velocity (or not, any Ruby templating technology available), and is entirely separate from Solr. Before that there was Flare: https://github.com/erikhatcher/solr-ruby-flare/tree/master/flare.Prism is the approach I'd (obviously) take these days, and it's getting some more attention, it looks like, soon. Blacklight and VuFind are much more richly capable. So there's options already out there, and surely many others that I don't even mention. A new top-level wiki page seems warranted from this discussion from http://wiki.apache.org/solr/FrontPage to list off all the various front-ends available. Erik On Dec 4, 2012, at 12:11 , Upayavira wrote: That's an interesting take. I agree that Solr needs *something* for folks to use. It is unfortunate that Solr actually has a functioning HTTP infrastructure, because it then makes less sense to build an alternative one up. E.g. How about: http://localhost:8983/solr - admin UI http://localhost:8983/browse - separate browse webapp It would be a separate app that runs as another wepapp, accessing Solr via HTTP just as any other app would. It could still use Velocity, but would demonstrate that you shouldn't integrate your app with Solr. A minimal dependency app for demonstration purposes only. Upayavira On Tue, Dec 4, 2012, at 02:37 PM, Jack Krupansky wrote: Or, maybe integrate /browse with the Solr Admin UI and give it a graphic treatment that screams that it is a development tool and not designed to be a model for an app UI. And, I still think it would be good to include SOME example of a prototype app UI with Solr, to drive home the point of here is [an example of] how you need to separate UI from Solr. -- Jack Krupansky -Original Message- From: Erik Hatcher Sent: Tuesday, December 04, 2012 9:29 AM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI On Dec 4, 2012, at 08:21 , Jack Krupansky wrote: let's also be clear always that Solr is meant to be behind the firewall Absolutely, but we are NOT doing that when we provide the Velocity-based /browse UI. Erik, your email example sounds reasonable, so if you want to substitute something like that for the /browse handler, fine. As you point out, it is not Velocity per se, but the /browse UI that results in a lack of clarity about Solr being meant to be behind the firewall. Point taken about being clear about this. But I disagree about removing /browse. It's useful, even if misunderstood/abused by some. If there are spots where we need to be clearer about what it is that is being rendered, how it's rendered, and the pros/cons to it, then let's see about getting it mentioned more clearly. But do keep in mind that something like this example: having Solr return suggestion lists as plain text suitable for suggest interfaces rather than having it return JSON or XML and having a middle tier process it when all you need is a plain list or some CSV. It's quite fine and sensible to use wt=velocity behind the firewall too, even /browse as-is. Same as with the XSL transformation writing capability. Erik=
Re: SolrCloud : impossible to create a new collection
On Dec 4, 2012, at 5:57 AM, LEFEBVRE Guillaume guillaume.lefeb...@cegedim.fr wrote: Hello, I have a SolrCloud environment with 2 collections running perfectly. I would like to create a new collection using : http://localhost:8080/solr/admin/collections?action=CREATEname=mycollectionnumShards=1numReplicas=2 But nothing appends ! Could you help me please ? Best regards, Guillaume Check the logs to see the result - probably the logs of the first instance you started (the overseer). It should indicate why the collection was not created. Before long we will have the call itself return the results so you don't have to dig into the logs for this. - Mark
Re: Range Queries performing differently on SortableIntField vs TrieField of type integer
: q=*:*fq=+i_yearStartSort:{* TO 1995}fq=+i_yearStopSort:{* TO *} ... : Unfortunately, under 3.6.1 with class=solr.TrieField type=integer, this : query is returning docs that have neither an i_yearStopSort nor a : i_yearStartSort value. H... I can't seem to reproduce this. Here's what i tried... 1) start up the Solr 3.6.1 example 2) index the 3.6.1 example docs... java -jar post.jar *.xml 3) index a single doc using some *_ti dynamic fields (which us tint)... java -Ddata=args -jar post.jar 'adddocfield name=idHOSS/fieldfield name=start_ti45/fieldfield name=end_ti100/field/doc/add' If i do some open ended range queries on the *_ti fields, i get the results i expect (either only my HOSS doc if it's in the ranges, or no docs if HOSS is out of range)... Matches HOSS... http://localhost:8983/solr/select?q=*:*fq=start_ti:{*%20TO%2050}fl=start_ti,id,end_ti http://localhost:8983/solr/select?q=*:*fq=start_ti:{*%20TO%2050}fq=end_ti:{*%20TO%20*}fl=start_ti,id,end_ti Matches nothing... http://localhost:8983/solr/select?q=*:*fq=start_ti:{*%20TO%205}fl=start_ti,id,end_ti http://localhost:8983/solr/select?q=*:*fq=start_ti:{*%20TO%205}fq=end_ti:{*%20TO%20*}fl=start_ti,id,end_ti I repeated the test after deleting all data, and adding sortMissingLast=true to the example tint fieldType, and got the same results. : Solr 3.6.1 Relevant Schema Parts - Not working as expected: : - : fieldType name=tint class=solr.TrieField type=integer : precisionStep=4 sortMissingLast=true positionIncrementGap=0 : omitNorms=true/ FYI: you have some wackiness there: 'type=integer' inside the 'fieldType name=tint .../' ... that shouldn't have caused any problems though, but it doesn't make any sense. : field name=i_yearStartSort type=tint indexed=true stored=false : required=false multiValued=false/ : field name=i_yearStopSort type=tint indexed=true stored=false : required=false multiValued=false/ can you try changing those to stored=true and re-indexing as a sanity check? perhaps your indexing code is putting a default value in that you aren't realizing? w/o more specifics (ie: sample docs to index) on how to reproduce, i can't seem to find any problem. -Hoss
Re: Range Queries performing differently on SortableIntField vs TrieField of type integer
Could you show us some input data, both WITH a i_yearStopSort value and WITHOUT the the value? I tried a quick test using the stock Solr 3.6.1 example schema and a dynamic integer field and the filter query did in fact filter out all documents that did not have a value in that field: http://localhost:8983/solr/select?q=*:*fq=%2bx_i:{*+TO+*} Maybe you could come up with a simple sample solrxml document that can be added to the stock 3.6.1 example schema that shows the problem. -- Jack Krupansky -Original Message- From: Aaron Daubman Sent: Tuesday, December 04, 2012 9:30 AM To: solr-user@lucene.apache.org Subject: Range Queries performing differently on SortableIntField vs TrieField of type integer Greetings, I'm finally updating an old instance and in testing, discovered that using the recommended TrieField instead of SortableIntField for range queries returns unexpected and seemingly incorrect results. A query with: q=*:*fq=+i_yearStartSort:{* TO 1995}fq=+i_yearStopSort:{* TO *} Should, and does under 1.4.1 with SortableIntField, only return docs that have some i_yearStopSort value and have an i_yearStartSort value less than 1995. Unfortunately, under 3.6.1 with class=solr.TrieField type=integer, this query is returning docs that have neither an i_yearStopSort nor a i_yearStartSort value. Here are the two schemas: Solr 1.4.1 Relevant Schema Parts - Working as desired: - fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ ... field name=i_yearStartSort type=sint indexed=true stored=false required=false multiValued=true/ field name=i_yearStopSort type=sint indexed=true stored=false required=false multiValued=true/ Solr 3.6.1 Relevant Schema Parts - Not working as expected: - fieldType name=tint class=solr.TrieField type=integer precisionStep=4 sortMissingLast=true positionIncrementGap=0 omitNorms=true/ ... field name=i_yearStartSort type=tint indexed=true stored=false required=false multiValued=false/ field name=i_yearStopSort type=tint indexed=true stored=false required=false multiValued=false/ 1) What is the best way to return to the desired/expected behavior? 2) Can you explain to me why this happens? 3) I have a sneaking suspicion (but could be totally wrong) that this relates to sortMissingLast=true - if it does, can you explain the seeming discrepancies in: SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix version of 3.5 listed, but some of the comments also seem to indicate this was not actually fixed in 3.5+ Thanks, Aaron
SolrCell takes InputStream
Hi, While using ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); The two ways of adding a file are up.addFile(File) up.addContentStream(ContentStream) However my raw files are stored on some remote storage devices. I am able to get an InputStream object for the file to be indexed. To me it may seem awkward to have the file temporarily stored locally. Is there a way of directly passing the InputStream in (e.g. constructing ContentStream using the InputStream)? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCell-takes-InputStream-tp4024315.html Sent from the Solr - User mailing list archive at Nabble.com.
Creating a collection without bootstrap
I seem to be missing a step or some kind of ordering in creating a new collection without using bootstrap upload. I have these steps: * zookeeper upconfig (pretty sure this is first) * Collection API create collection * zookeeper linkconfig I'm working from this page: http://wiki.apache.org/solr/SolrCloud A step-by-step recipe would be really nice. wunder -- Walter Underwood wun...@wunderwood.org Search Guy, Chegg.com
Re: Loading DictionaryCompoundWordTokenFilterFactory as shared object across all cores
: Do we have any ways where we can load : DictionaryCompoundWordTokenFilterFactory only once and shared across all : the cores?. I don't think so, but there are tricks you can use in a custom plugin variant depending on your use cases, as well as a really easy solution if the schema's for all of your collections are identical... http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201210.mbox/%3Calpine.DEB.2.02.1210161830351.31983@frisbee%3E -Hoss
Re: Sorting by multi-valued field
: perfectly, but users expect the result set to be ordered by the next start : time. ... : Is there a more elegant way to do this in Solr? A function query or : subquery maybe? I thought about it for quite a while and couldn't come up : with a viable solution. I think you could concievably write a custom function that built an UnInvertedField over your multivalued field, and then returned the lowest value for each doc where the value is after 'NOW' but there is nothing out of the box that will do this for you (and i haven't really thought hard about how viable this approach is ... i can't think of any obvious problems off the top of my head) -Hoss
Re: Sorting by multi-valued field
But it would be a lot harder than either splitting them out into separate docs, or writing code to re-index docs when one of their 'next-event' dates passes, with a new single valued 'next-event' field. Less efficient, but easier to write/manage. Upayavira On Tue, Dec 4, 2012, at 07:35 PM, Chris Hostetter wrote: : perfectly, but users expect the result set to be ordered by the next start : time. ... : Is there a more elegant way to do this in Solr? A function query or : subquery maybe? I thought about it for quite a while and couldn't come up : with a viable solution. I think you could concievably write a custom function that built an UnInvertedField over your multivalued field, and then returned the lowest value for each doc where the value is after 'NOW' but there is nothing out of the box that will do this for you (and i haven't really thought hard about how viable this approach is ... i can't think of any obvious problems off the top of my head) -Hoss
Re: Sorting by multi-valued field
: But it would be a lot harder than either splitting them out into : separate docs, or writing code to re-index docs when one of their : 'next-event' dates passes, with a new single valued 'next-event' field. : Less efficient, but easier to write/manage. Don't get me wrong -- if you can determine at index time which single value you wnat to use to sort on then by all means that is going to be the best approach -- it's precisely the reason why FirstFieldValueUpdateProcessorFactory, LastFieldValueUpdateProcessorFactory, MaxFieldValueUpdateProcessorFactory, and MinFieldValueUpdateProcessorFactory. But my interpretation of the next start time is that it wsa dependent on the value of NOW when the query was executed (ie: some of the indexed values may be in the past) in which case that approach wouldn't work. : On Tue, Dec 4, 2012, at 07:35 PM, Chris Hostetter wrote: : : : perfectly, but users expect the result set to be ordered by the next : start : : time. : ... : : Is there a more elegant way to do this in Solr? A function query or : : subquery maybe? I thought about it for quite a while and couldn't come : up : : with a viable solution. : : I think you could concievably write a custom function that built an : UnInvertedField over your multivalued field, and then returned the : lowest : value for each doc where the value is after 'NOW' but there is nothing : out of the box that will do this for you (and i haven't really thought : hard about how viable this approach is ... i can't think of any obvious : problems off the top of my head) : : -Hoss : -Hoss
Re: How to change Solr UI
But there's value in having something packaged within Solr itself, for demo purposes. That would I suspect make it Java (like it or not!) And that would probably not make it very state-of-the art, unless it used jquery, with a very lightweight java portion, which would be possible. Upayavira On Tue, Dec 4, 2012, at 05:42 PM, Erik Hatcher wrote: And basically that's what i had in mind with Prism here: https://github.com/lucidimagination/Prism Prism's very lightweight, uses Velocity (or not, any Ruby templating technology available), and is entirely separate from Solr. Before that there was Flare: https://github.com/erikhatcher/solr-ruby-flare/tree/master/flare. Prism is the approach I'd (obviously) take these days, and it's getting some more attention, it looks like, soon. Blacklight and VuFind are much more richly capable. So there's options already out there, and surely many others that I don't even mention. A new top-level wiki page seems warranted from this discussion from http://wiki.apache.org/solr/FrontPage to list off all the various front-ends available. Erik On Dec 4, 2012, at 12:11 , Upayavira wrote: That's an interesting take. I agree that Solr needs *something* for folks to use. It is unfortunate that Solr actually has a functioning HTTP infrastructure, because it then makes less sense to build an alternative one up. E.g. How about: http://localhost:8983/solr - admin UI http://localhost:8983/browse - separate browse webapp It would be a separate app that runs as another wepapp, accessing Solr via HTTP just as any other app would. It could still use Velocity, but would demonstrate that you shouldn't integrate your app with Solr. A minimal dependency app for demonstration purposes only. Upayavira On Tue, Dec 4, 2012, at 02:37 PM, Jack Krupansky wrote: Or, maybe integrate /browse with the Solr Admin UI and give it a graphic treatment that screams that it is a development tool and not designed to be a model for an app UI. And, I still think it would be good to include SOME example of a prototype app UI with Solr, to drive home the point of here is [an example of] how you need to separate UI from Solr. -- Jack Krupansky -Original Message- From: Erik Hatcher Sent: Tuesday, December 04, 2012 9:29 AM To: solr-user@lucene.apache.org Subject: Re: How to change Solr UI On Dec 4, 2012, at 08:21 , Jack Krupansky wrote: let's also be clear always that Solr is meant to be behind the firewall Absolutely, but we are NOT doing that when we provide the Velocity-based /browse UI. Erik, your email example sounds reasonable, so if you want to substitute something like that for the /browse handler, fine. As you point out, it is not Velocity per se, but the /browse UI that results in a lack of clarity about Solr being meant to be behind the firewall. Point taken about being clear about this. But I disagree about removing /browse. It's useful, even if misunderstood/abused by some. If there are spots where we need to be clearer about what it is that is being rendered, how it's rendered, and the pros/cons to it, then let's see about getting it mentioned more clearly. But do keep in mind that something like this example: having Solr return suggestion lists as plain text suitable for suggest interfaces rather than having it return JSON or XML and having a middle tier process it when all you need is a plain list or some CSV. It's quite fine and sensible to use wt=velocity behind the firewall too, even /browse as-is. Same as with the XSL transformation writing capability. Erik=
Re: Creating a collection without bootstrap
Here is one problem. On the SolrCloud wiki page, it says link collection sets to collections, but I'm pretty sure that should read config set. Also config set (or conf set) is never defined. wunder On Dec 4, 2012, at 11:07 AM, Walter Underwood wrote: I seem to be missing a step or some kind of ordering in creating a new collection without using bootstrap upload. I have these steps: * zookeeper upconfig (pretty sure this is first) * Collection API create collection * zookeeper linkconfig I'm working from this page: http://wiki.apache.org/solr/SolrCloud A step-by-step recipe would be really nice. wunder -- Walter Underwood wun...@wunderwood.org Search Guy, Chegg.com
Solr 4 : Optimize very slow
Hi All, I have recently migrated from solr 1.4 to solr 4 and have done the basic changes required for solr 4 in solrconfig.xml and schema.xml. I have also rebuilt the index set for solr 4. We run optimize every morning at 4 am and we keep the index updates off during this process. Previously, with 1.4 - the optimization used to take around 20-30 mins per shard but now with solr 4, its taking 6-8 hours or even more.. I have also tested the optimize from solr UI and that takes 6-8 hours too.. The hardware is saeme and, we have deployed solr under WAS. There ar 4 shards and every shard contains around 8 - 9 Gig of data and we are using master-slave configuration with rsync. I have not enabled soft commit. Also, commiter process is scheduled to run every minute. I am not sure which part I'm missing, do let me know your inputs please. Many Thanks in advance, Sandeep
Re: Replication error and Shard Inconsistencies..
Hey Annette, Are you using Solr 4.0 final? A version of 4x or 5x? Do you have the logs for when the replica tried to catch up to the leader? Stopping and starting the node is actually a fine thing to do. Perhaps you can try it again and capture the logs. If a node is not listed as live but is in the clusterstate, that is fine. It shouldn't be consulted. To remove it, you either have to unload it with the core admin api or you could manually delete it's registered state under the node states node that the Overseer looks at. Also, it would be useful to see the logs of the new node coming up…there should be info about what happens when it tries to replicate. It almost sounds like replication is just not working for your setup at all and that you have to tweak some configuration. You shouldn't see these nodes as active then though - so we should get to the bottom of this. - Mark On Dec 4, 2012, at 4:37 AM, Annette Newton annette.new...@servicetick.com wrote: Hi all, I have a quite weird issue with Solr cloud. I have a 4 shard, 2 replica setup, yesterday one of the nodes lost communication with the cloud setup, which resulted in it trying to run replication, this failed, which has left me with a Shard (Shard 4) that has one node with 2,833,940 documents on the leader and 409,837 on the follower – obviously a big discrepancy and this leads to queries returning differing results depending on which of these nodes it gets the data from. There is no indication of a problem on the admin site other than the big discrepancy in the number of documents. They are all marked as active etc… So I thought that I would force replication to happen again, by stopping and starting solr (probably the wrong thing to do) but this resulted in no change. So I turned off that node and replaced it with a new one. In zookeeper live nodes doesn’t list that machine but it is still being shown as active on in the ClusterState.json, I have attached images showing this… This means the new node hasn’t replaced the old node but is now a replica on Shard 1! Also that node doesn’t appear to have replicated Shard 1’s data anyway, it didn’t get marked with replicating or anything… How do I clear the zookeeper state without taking down the entire solr cloud setup? How do I force a node to replicate from the others in the shard? Thanks in advance. Annette Newton LiveNodes.zip
Re: SOLR4 cluster - strange CPU spike on slave
On Dec 4, 2012, at 2:25 AM, John Nielsen j...@mcb.dk wrote: The post about mmapdirectory is really interesting. We switched to using that from NRTCachingDirectory and am monitoring performance as well. Initially performance doesn't look stellar, but i suspect that we lack memory in the server to really make it shine. NRTCachingDirectory delegates to another directory and simply caches small segments in RAM - usually it delegates MMapDirectory by default. So likely you won't notice any changes, because you have not likely really changed anything. NRTCachingDirectory simply helps in the NRT case and doesn't really hurt that I've seen in the std case. It's more like a help dir than a replacement. - Mark
Re: Solr 4 : Optimize very slow
Hi, You should search the ML archives for : optimize wunder Erick Otis :) Is WAS really AWS? If so, if these are new EC2 instances you are unfortunately unable to do a fair apples to apples comparison. Have you tried a different set of instances? Otis -- Performance Monitoring - http://sematext.com/spm On Dec 4, 2012 6:29 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi All, I have recently migrated from solr 1.4 to solr 4 and have done the basic changes required for solr 4 in solrconfig.xml and schema.xml. I have also rebuilt the index set for solr 4. We run optimize every morning at 4 am and we keep the index updates off during this process. Previously, with 1.4 - the optimization used to take around 20-30 mins per shard but now with solr 4, its taking 6-8 hours or even more.. I have also tested the optimize from solr UI and that takes 6-8 hours too.. The hardware is saeme and, we have deployed solr under WAS. There ar 4 shards and every shard contains around 8 - 9 Gig of data and we are using master-slave configuration with rsync. I have not enabled soft commit. Also, commiter process is scheduled to run every minute. I am not sure which part I'm missing, do let me know your inputs please. Many Thanks in advance, Sandeep
Getting deleted documents during DIH full-import
I am doing a DIH full import on a very recent checkout from branch_4x. Something I've recently done differently is enabling autocommit. I am seeing that there are deleted documents in some of the indexes. See Development Build Indexes at the bottom of the following screenshot. When the import is complete, the numbered shards will contain 13 million documents. http://dl.dropbox.com/u/97770508/statuspage-deletes-import.png The MySQL database that this imports from has a unique index on the field that Solr is using for its UniqueKey, soit's not possible to have duplicates. Each import uses one SELECT statement for the entire 13 million document import. What might be leading to these deleted docs? Thanks, Shawn
Re: Getting deleted documents during DIH full-import
On 12/4/2012 5:33 PM, Shawn Heisey wrote: I am doing a DIH full import on a very recent checkout from branch_4x. Something I've recently done differently is enabling autocommit. I am seeing that there are deleted documents in some of the indexes. See Development Build Indexes at the bottom of the following screenshot. When the import is complete, the numbered shards will contain 13 million documents. http://dl.dropbox.com/u/97770508/statuspage-deletes-import.png The MySQL database that this imports from has a unique index on the field that Solr is using for its UniqueKey, soit's not possible to have duplicates. Each import uses one SELECT statement for the entire 13 million document import. What might be leading to these deleted docs? Interesting development: The imports are now up to over 11 million documents, but now the number of deleted documents on all shards is zero. I calculate deleted documents on my stats page by subtracting numDocs from maxDoc, information gathered from /admin/mbeans?stats=true. Thanks, Shawn
Re: How to SWAP cores (or collections) with SolrCloud (SOLR-3866)
On Dec 4, 2012, at 4:57 AM, Andre Bois-Crettez andre.b...@kelkoo.com wrote: * what can we do to help progress on SOLR-3866 ? Maybe use case scenarios, detailing desired behavior ? Constrains on what cores or collections are allowed to SWAP, ie. same config, same doc-shard assignments ? Yes please - if you could elaborate on that issue, I can help you try and get something in. - Mark
SolrCloud stops handling collection CREATE/DELETE (but responds HTTP 200)
Hi, I have a Cloud setup of 4 machines. I bootstrapped them with 1 collection, which I called default and haven't used since. I'm using an external ZK ensemble that was completely empty before I started this cloud. Once I had all 4 nodes in the cloud I used the collection API to create the real collections I wanted. I also tested that deleting works. For example, # this worked curl http://localhost:8984/solr/admin/collections?action=CREATEname=15678numShards=4 # this worked curl http://localhost:8984/solr/admin/collections?action=DELETEname=15678; Next, I started my indexer service which happily sent many, many updates to the cloud. Queries against the collections also work just fine. Finally, a few hours later, I tried doing a create and a delete. Both operations did nothing, although Solr replied with a 200 OK. $ curl -i http://localhost:8984/solr/admin/collections?action=CREATEname=15679numShards=4 HTTP/1.1 200 OK Content-Type: application/xml; charset=UTF-8 Transfer-Encoding: chunked ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime3/int/lst There is nothing in the stdout/stderr logs, nor the Java logs (I have it set to WARN). I have tried bouncing the nodes and it doesn't change anything. Any ideas? How can I further debug this or what else can I provide?
RE: Solr 4 : Optimize very slow
When I upgraded from 3.2 to 3.6, I found that an optimize - all other variables being the same - took about twice as long. Eventually I was able to track this down to the new default of MMapDirectory. By changing back to NIOFSDirectory, I was able to get the optimize time back down to what it formerly was. I did this by adding this to solrconfig.xml: directoryFactory name=DirectoryFactory class=solr.NIOFSDirectoryFactory/ I'd suggest trying that to see what effect it has (for us, NIOFSDirectory generally performs better across the board, but I've heard just the opposite from other people on this mailing list). If that doesn't improve it, try looking at these things: 1) Is the size of the index files the same size as in 1.4? Perhaps something has changed to cause a significant size increase. 2) Is Solr 4 spending more time garbage collecting? Enable gc logging with -verbose:gc (or whatever the flag is), or use the jstat utility. 3) Watch the files in the index directory during the optimize and see if they are being written more slowly, or if the segment files are being copied around more often than before. -Michael -Original Message- From: Sandeep Mestry [mailto:sanmes...@gmail.com] Sent: Tuesday, December 04, 2012 6:29 PM To: solr-user@lucene.apache.org Subject: Solr 4 : Optimize very slow Hi All, I have recently migrated from solr 1.4 to solr 4 and have done the basic changes required for solr 4 in solrconfig.xml and schema.xml. I have also rebuilt the index set for solr 4. We run optimize every morning at 4 am and we keep the index updates off during this process. Previously, with 1.4 - the optimization used to take around 20-30 mins per shard but now with solr 4, its taking 6-8 hours or even more.. I have also tested the optimize from solr UI and that takes 6-8 hours too.. The hardware is saeme and, we have deployed solr under WAS. There ar 4 shards and every shard contains around 8 - 9 Gig of data and we are using master-slave configuration with rsync. I have not enabled soft commit. Also, commiter process is scheduled to run every minute. I am not sure which part I'm missing, do let me know your inputs please. Many Thanks in advance, Sandeep
Re: Loading DictionaryCompoundWordTokenFilterFactory as shared object across all cores
We are using the same schema, we did try using shareSchema=true in solr.xml, during indexing time, it works fine. It loads single time. But during query time, it loads multiple multiple at core level. On Wed, Dec 5, 2012 at 1:00 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Do we have any ways where we can load : DictionaryCompoundWordTokenFilterFactory only once and shared across all : the cores?. I don't think so, but there are tricks you can use in a custom plugin variant depending on your use cases, as well as a really easy solution if the schema's for all of your collections are identical... http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201210.mbox/%3Calpine.DEB.2.02.1210161830351.31983@frisbee%3E -Hoss
Re: Solr 4 : Optimize very slow
I tried that search, without success :-( I suspect what Otis was trying to say was to question why you are optimising. Optimise was necessary under 1.4, but with newer Solr, the new TieredMergePolicy does a much better job of handling background merging, reducing the need for optimize. Try just not doing it at all and see if your index actually reaches a point where it is needed. Upayavira On Wed, Dec 5, 2012, at 12:31 AM, Otis Gospodnetic wrote: Hi, You should search the ML archives for : optimize wunder Erick Otis :) Is WAS really AWS? If so, if these are new EC2 instances you are unfortunately unable to do a fair apples to apples comparison. Have you tried a different set of instances? Otis -- Performance Monitoring - http://sematext.com/spm On Dec 4, 2012 6:29 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi All, I have recently migrated from solr 1.4 to solr 4 and have done the basic changes required for solr 4 in solrconfig.xml and schema.xml. I have also rebuilt the index set for solr 4. We run optimize every morning at 4 am and we keep the index updates off during this process. Previously, with 1.4 - the optimization used to take around 20-30 mins per shard but now with solr 4, its taking 6-8 hours or even more.. I have also tested the optimize from solr UI and that takes 6-8 hours too.. The hardware is saeme and, we have deployed solr under WAS. There ar 4 shards and every shard contains around 8 - 9 Gig of data and we are using master-slave configuration with rsync. I have not enabled soft commit. Also, commiter process is scheduled to run every minute. I am not sure which part I'm missing, do let me know your inputs please. Many Thanks in advance, Sandeep
Maximum number of cores
Hi, I am using solr4.0, i have created 10 cores in solr. I want to know how many maximum number of cores can be created in solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Maximum-number-of-cores-tp4024398.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4 : Optimize very slow
It was not necessary under 1.4. It has never been necessary. It was not necessary in Ultraseek Server in 1996, using the same merging model. In some cases, it can be a good idea. Since you are continuously updating, this is not one of those cases. wunder On Dec 4, 2012, at 9:29 PM, Upayavira wrote: I tried that search, without success :-( I suspect what Otis was trying to say was to question why you are optimising. Optimise was necessary under 1.4, but with newer Solr, the new TieredMergePolicy does a much better job of handling background merging, reducing the need for optimize. Try just not doing it at all and see if your index actually reaches a point where it is needed. Upayavira On Wed, Dec 5, 2012, at 12:31 AM, Otis Gospodnetic wrote: Hi, You should search the ML archives for : optimize wunder Erick Otis :) Is WAS really AWS? If so, if these are new EC2 instances you are unfortunately unable to do a fair apples to apples comparison. Have you tried a different set of instances? Otis -- Performance Monitoring - http://sematext.com/spm On Dec 4, 2012 6:29 PM, Sandeep Mestry sanmes...@gmail.com wrote: Hi All, I have recently migrated from solr 1.4 to solr 4 and have done the basic changes required for solr 4 in solrconfig.xml and schema.xml. I have also rebuilt the index set for solr 4. We run optimize every morning at 4 am and we keep the index updates off during this process. Previously, with 1.4 - the optimization used to take around 20-30 mins per shard but now with solr 4, its taking 6-8 hours or even more.. I have also tested the optimize from solr UI and that takes 6-8 hours too.. The hardware is saeme and, we have deployed solr under WAS. There ar 4 shards and every shard contains around 8 - 9 Gig of data and we are using master-slave configuration with rsync. I have not enabled soft commit. Also, commiter process is scheduled to run every minute. I am not sure which part I'm missing, do let me know your inputs please. Many Thanks in advance, Sandeep -- Walter Underwood wun...@wunderwood.org
how to assign dedicated server for indexing and add more shard in SolrCloud
I'm using master and slave server for scaling. Master is dedicated for indexing and slave is for searching. Now, I'm planning to move SolrCloud. It has leader and replicas. Leader acts like master and replicas acts like slave. Is it right? so, I'm wondering two things. First, How can I assign dedicated server for indexing in SolrCloud? Second, Consider I'm using two shard cluster with shard replicas http://wiki.apache.org/solr/SolrCloud#Example_B:_Simple_two_shard_cluster_with_shard_replicas and I need to extend one more shard with replicas. In this case, existing two shards and replicas will already have many docs. so, I want to add indexing docs in new one only. How can I do this? Actually, I don't understand perfectly about SolrCloud. So, my questions can be ridiculous. Any inputs are welcome. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-assign-dedicated-server-for-indexing-and-add-more-shard-in-SolrCloud-tp4024404.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding filter in solr suggester component.
Hi We are using solr (version - 3.6) suggester component for auto complete. We indexed solr core column (which we want as auto complete result) and its giving me correct auto complete result. Now I want to add filter on suggester indexed data. Let say if we have core with userId and notes fileds. We want add userId filter for auto complete so that it provide notes of that user only during auto complete. I gone throght following link First Link http://stackoverflow.com/questions/9004266/solr-spell-check-result-based-filter-query Second Link http://lucene.472066.n3.nabble.com/Issue-using-filter-query-with-spellCheck-component-td2166322.html Plz help me... -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-filter-in-solr-suggester-component-tp4024407.html Sent from the Solr - User mailing list archive at Nabble.com.