Re: Fix sort order within an index ?
On Mon, Oct 7, 2013, at 11:09 PM, user 01 wrote: Any way to store documents in a fixed sort order within the indexes of certain fields(either the arrival order or sorted by int ids, that also serve as my unique key), so that I could store them optimized for browsing lists of items ? The order for browsing is always fixed there are no further filter queries. Just I need to fetch the top 20 (most recently added) document with field value topic=x1 I came across this article a JIRA issue which encouraged me that something like this may be possible: http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html https://issues.apache.org/jira/browse/LUCENE-4752 That ticket is an optimisation. If your IDs are sequential, you can sort on them. Or you can add a timestamp field with a default of NOW, and sort on that. q=topic:x1rows=20sort=id desc Or q=topic:x1rows=20sort=timestamp desc Will get you what you ask for. The above ticket might just make it a little faster. Upayavira
Re: How to round solr score ?
Thanks for your replies. I am actually doing the frange approach for now. The only downside I see there is it makes the function call twice, calling createWeight() twice. And so my social connections are evaluated twice which is quite heavy operation. So I was thinking if I could get away with one additional call. This email is intended for the person(s) to whom it is addressed and may contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure by any person other than the addressee(s) is strictly prohibited. If you have received this email in error, please notify the sender immediately by return email and delete the message and any attachments from your system.
SolrCloud shard splitting keeps failing
I have a test system where I have a index of 15M documents in one shard that I would like to split in two. I've tried it four times now. I have a stand-alone zookeeper running on the same machine. The end result is that I have two new shards with state construction, and each has one replica which is down. Two of the attempts failed because of heapspace. Now the heap size is 24GB. I can't figure out from the logs what is going on. I've attached a log of the latest attempt. Any help would be much appreciated. - Kalle Aaltonen splitfail3.txt.gz Description: GNU Zip compressed data
DIH with SolrCloud
Hi , I have setup solrcloud with solr4.4. The cloud has 2 tomcat instances with separate zookeeper. i execute the below command in the url, http://localhost:8180/solr/colindexer/dataimportmssql?command=full-importcommit=trueclean=false response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configdata-config-mssql.xml/str /lst /lst str name=commandstatus/str str name=statusidle/str str name=importResponse/ lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Skipped0/str str name=Full Dump Started2013-10-08 10:55:27/str str name=Total Documents Processed0/str str name=Time taken0:0:1.585/str /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response I dont get Indexing completed. added documents ... status message at all. Also, when i check the dataimport in Solr admin page,get the below status. and no documents are indexed. [image: Inline image 1] Not sure of the problem.
Re: SolrCloud shard splitting keeps failing
Hello Kalle, we noticed the same problem some weeks ago: http://lucene.472066.n3.nabble.com/Share-splitting-at-23-million-documents-gt-OOM-td4085064.html Would be interesting to hear if there is more positive feedback this time. We finally concluded that it may be worth to start with many shards right away. And as they grow, they can be distributed to other machines. This works, as we have tested (yet not in production). Regards, Harald. On 08.10.2013 08:43, Kalle Aaltonen wrote: I have a test system where I have a index of 15M documents in one shard that I would like to split in two. I've tried it four times now. I have a stand-alone zookeeper running on the same machine. The end result is that I have two new shards with state construction, and each has one replica which is down. Two of the attempts failed because of heapspace. Now the heap size is 24GB. I can't figure out from the logs what is going on. I've attached a log of the latest attempt. Any help would be much appreciated. - Kalle Aaltonen
Regex to match one of two words
I have an input that can have only 2 values Published or Deprecated. What regular expression can I use to ensure that either of the two words was submitted? I tried with different regular expressions (as in the [1], [2]) that contains most generic syntax.. But Solar throws parser exception when validating these expressions.. Could someone help me on writing this regular expression that will evaluate by the Solar parser. [1] /^(PUBLISHED)?(DEPRECATED)?$/ [2] /(PUBLISHED)?(DEPRECATED)?/ SolrCore org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'overview_status_s:/(PUBLISHED)?(DEPRECATED)?/': '*' or '?' not allowed as first character in WildcardQuery at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108) Regards, Dinusha.
Re: SolrCloud shard splitting keeps failing
Hi Kalle, The problem here is that certain actions are taking too long causing the split process to terminate in between. For example, a commit on the parent shard leader took 83 seconds in your case but the read timeout value is set to 60 seconds only. We actually do not need to open a searcher during this commit. I'll open an issue and attach a fix. Longer term we need to introduce asynchronous commands so that status can be reported in a better way. On Tue, Oct 8, 2013 at 12:13 PM, Kalle Aaltonen kalle.aalto...@zemanta.comwrote: I have a test system where I have a index of 15M documents in one shard that I would like to split in two. I've tried it four times now. I have a stand-alone zookeeper running on the same machine. The end result is that I have two new shards with state construction, and each has one replica which is down. Two of the attempts failed because of heapspace. Now the heap size is 24GB. I can't figure out from the logs what is going on. I've attached a log of the latest attempt. Any help would be much appreciated. - Kalle Aaltonen -- Regards, Shalin Shekhar Mangar.
Re: DIH with SolrCloud
It looks like your select statement does not return any rows... have you verified it with some sort of SQL client? On Tue, Oct 8, 2013 at 8:57 AM, Prasi S prasi1...@gmail.com wrote: Hi , I have setup solrcloud with solr4.4. The cloud has 2 tomcat instances with separate zookeeper. i execute the below command in the url, http://localhost:8180/solr/colindexer/dataimportmssql?command=full-importcommit=trueclean=false response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configdata-config-mssql.xml/str /lst /lst str name=commandstatus/str str name=statusidle/str str name=importResponse/ lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Skipped0/str str name=Full Dump Started2013-10-08 10:55:27/str str name=Total Documents Processed0/str str name=Time taken0:0:1.585/str /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response I dont get Indexing completed. added documents ... status message at all. Also, when i check the dataimport in Solr admin page,get the below status. and no documents are indexed. [image: Inline image 1] Not sure of the problem.
SolrCloud+Tomcat 3 win VMs, 3 shards * 2 replica
Hello, I'm trying to deploy, using SolRCloud, a cluster of 3 VMs with Windows, each with an instance of SolR running on a Tomcat container AND with an external ZooKeeper (3.4.5) (so 3 ZK + 3 SolR). I'm using SolR 4.2, the original conf is multi-core (6 different cores) I tried to set up a configuration of 3 shards each with 2 replica (1 original + 1), so that: * VM1 -- shards 1,2 * VM2 -- shards 2,3 * VM3 -- shards 1,3 After days of googling, reading documentation (in particular here https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble , here http://wiki.apache.org/solr/SolrCloud#Example_C:_Two_shard_cluster_with_shard_replicas_and_zookeeper_ensemble and here http://wiki.apache.org/solr/SolrCloudTomcat ) and browsing forums, I can't still find the solution. Apparently the only way to force 2 shards on the same machine is to use Collection API (otherwise I could only deploy 3 shards * 1 replica,using numShards or 1 shard * 3 replica). After several attempts (almost all combinations of adding/removing bootstrap_conf=true, solr.xml persistent true/false, removing/leaving 'core' tags in solr.xml, using DELETE/RELOAD/CREATE on collections) I managed to deploy this configuration using boostrap_conf=true, DELETing and CREATing on each collection, but when I stop SolR service and then start it again, it does not work (adding/removing boostrap_conf etc.). I think this quite a standard use case, is there a simple solution avoiding very ugly workarounds like deploying 2 tomcats or more than 1 SolR on tomcat? Thank you very much -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Tomcat-3-win-VMs-3-shards-2-replica-tp4094051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH with SolrCloud
My select statement retusn documents. i have checked the query in the sql server. The problem is the same configuration i have given with default handler /dataimport. It was working. If i give it with /dataimportmssql handler , i get this type of behaviour On Tue, Oct 8, 2013 at 1:28 PM, Raymond Wiker rwi...@gmail.com wrote: It looks like your select statement does not return any rows... have you verified it with some sort of SQL client? On Tue, Oct 8, 2013 at 8:57 AM, Prasi S prasi1...@gmail.com wrote: Hi , I have setup solrcloud with solr4.4. The cloud has 2 tomcat instances with separate zookeeper. i execute the below command in the url, http://localhost:8180/solr/colindexer/dataimportmssql?command=full-importcommit=trueclean=false response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configdata-config-mssql.xml/str /lst /lst str name=commandstatus/str str name=statusidle/str str name=importResponse/ lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Skipped0/str str name=Full Dump Started2013-10-08 10:55:27/str str name=Total Documents Processed0/str str name=Time taken0:0:1.585/str /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response I dont get Indexing completed. added documents ... status message at all. Also, when i check the dataimport in Solr admin page,get the below status. and no documents are indexed. [image: Inline image 1] Not sure of the problem.
What is the full list of Solr Special Characters?
I found that: + - || ! ( ) { } [ ] ^ ~ * ? : \ at that URL: http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping+Special+Characters I'm using Solr 4.5 Is there any full list of special characters to escape inside my custom search API before making a request to SolrCloud?
Re: What is the full list of Solr Special Characters?
Actually I want to remove special characters and wont send them into my Solr indexes. I mean user can send a special query as like a SQL injection and I want to prevent my system such kind of scenarios. 2013/10/8 Furkan KAMACI furkankam...@gmail.com I found that: + - || ! ( ) { } [ ] ^ ~ * ? : \ at that URL: http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping+Special+Characters I'm using Solr 4.5 Is there any full list of special characters to escape inside my custom search API before making a request to SolrCloud?
Re: documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json
I've solved this problem myself. If you use core discovery, you must specify the numShards parameter in core.properties. or else solr won't be allocate range for each shards and then documents won't be distributed properly. Using core discovery to set up solr cloud in tomcat is much easier and clean than coreAdmin described in the wiki: http://wiki.apache.org/solr/SolrCloudTomcat. It costs me some time to move from jetty to tomcat, but I think our IT team will like this way. :) On 6 October 2013 23:53, Liu Bo diabl...@gmail.com wrote: Hi all I've sent out this mail before, but I only subscribed to lucene-user but not solr-user at that time. Sorry for repeating if any and your help will be much of my appreciation. I'm trying out the tutorial about solrcloud, and then I manage to write my own plugin to import data from our set of databases, I use SolrWriter from DataImporter package and the docs could be distributed commit to shards. Every thing works fine using jetty from the solr example, but when I move to tomcat, solrcloud seems not been configured right. As the documents are just committed to the shard where update requested goes to. The cause probably is the range is null for shards in clusterstate.json. The router is implicit instead of compositeId as well. Is there anything missed or configured wrong in the following steps? How could I fix it. Your help will be much of my appreciation. PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki pages. Here's what I've done and some useful logs: 1. start three zookeeper server. 2. upload configuration files to zookeeper, the collection name is content_collection 3. start three tomcat instants on three server with core discovery a) core file: name=content loadOnStartup=true transient=false shard=shard1 (differrent on servers) collection=content_collection b) solr.xml solr solrcloud str name=host${host:}/str str name=hostContext${hostContext:solr}/str int name=hostPort8080/int int name=zkClientTimeout${zkClientTimeout:15000}/int str name=zkHost10.199.46.176:2181,10.199.46.165:2181, 10.199.46.158:2181/str bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool /solrcloud shardHandlerFactory name=shardHandlerFactory class=HttpShardHandlerFactory int name=socketTimeout${socketTimeout:0}/int int name=connTimeout${connTimeout:0}/int /shardHandlerFactory /solr 4. In the solr.log, I see the three shards are recognized, and the solrcloud can see the content_collection has three shards as well. 5. write documents to content_collection using my update request, the documents only commits to the shard the request goes to, in the log I can see the DistributedUpdateProcessorFactory is in the processorChain and disribute commit is triggered: INFO - 2013-09-30 16:31:43.205; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; updata request processor factories: INFO - 2013-09-30 16:31:43.206; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77 INFO - 2013-09-30 16:31:43.207; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.*DistributedUpdateProcessorFactory* @5b2bc407 INFO - 2013-09-30 16:31:43.207; com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler; org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654 INFO - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onInit: commits: num=1 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1} INFO - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy; newest commit generation = 1 INFO - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor; Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/, StdNode: http://10.199.46.165:8080/solr/content/] params:commit_end_point=truecommit=truesoftCommit=falsewaitSearcher=trueexpungeDeletes=false but the documents won't go to other shards, the other shards only has a request with not documents: INFO - 2013-09-30 16:31:43.841; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} INFO - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy; SolrDeletionPolicy.onInit: commits: num=1 commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1} INFO - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy; newest commit
Re: Improving indexing performance
Thanks Erik, I think I have been able to exhaust a resource if I split the data in 2 and upload it with 2 clients like benchmark 1.1 it takes 120s here the bottleneck it my LAN, if I use a setting like benchmark 1 probably the bottleneck is the ramBuffer. I'm going to buy a Gigabit ethernet cable so I can make a better test. OutOfMemory error: it's the solrj client that crashes I'm using solr 4.2.1 and corresponding solrj client httpsolrserver works fine concurrentupdatesolrsever gives me problems, and I didn't understand how to size the queuesize parameter optimally Il giorno 07/ott/2013, alle ore 14:03, Erick Erickson ha scritto: Just skimmed, but the usual reason you can't max out the server is that the client can't go fast enough. Very quick experiment: comment out the server.add line in your client and run it again, does that speed up the client substantially? If not, then the time is being spent on the client. Or split your csv file into, say, 5 parts and run it from 5 different PCs in parallel. bq: I can't rely on auto commit, otherwise I get an OutOfMemory error This shouldn't be happening, I'd get to the bottom of this. Perhaps simply allocating more memory to the JVM running Solr. bq: committing every 100k docs gives worse performance It'll be best to specify openSearcher=false for max indexing throughput BTW. You should be able to do this quite frequently, 15 seconds seems quite reasonable. Best, Erick On Sun, Oct 6, 2013 at 12:19 PM, Matteo Grolla matteo.gro...@gmail.com wrote: I'd like to have some suggestion on how to improve the indexing performance on the following scenario I'm uploading 1M docs to solr, every docs has id: sequential number title: small string date: date body: 1kb of text Here are my benchmarks (they are all single executions, not averages from multiple executions): 1) using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document total time: 143035ms 1.1)using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document ramBufferSizeMB500/ramBufferSizeMB maxBufferedDocs10/maxBufferedDocs total time: 134493ms 1.2)using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document mergeFactor30/mergeFactor total time: 143134ms 2) using a solrj client from another pc in the lan (100Mbps) with httpsolrserver with javabin format add documents to the server in batches of 1k docs ( server.add( collection ) ) auto commit every 15s with openSearcher=false and commit after last document total time: 139022ms 3) using a solrj client from another pc in the lan (100Mbps) with concurrentupdatesolrserver with javelin format add documents to the server in batches of 1k docs ( server.add( collection ) ) server queue size=20k server threads=4 no auto-commit and commit every 100k docs total time: 167301ms --On the solr server-- cpu averages25% at best 100% for 1 core IO is still far from being saturated iostat gives a pattern like this (every 5 s) time(s) %util 100 45,20 105 1,68 110 17,44 115 76,32 120 2,64 125 68 130 1,28 I thought that using concurrentupdatesolrserver I was able to max cpu or IO but I wasn't. With concurrentupdatesolrserver I can't rely on auto commit, otherwise I get an OutOfMemory error and I found that committing every 100k docs gives worse performance than auto commit every 15s (benchmark 3 with httpsolrserver took 193515) I'd really like to understand why I can't max out the resources on the server hosting solr (disk above all) And I'd really like to understand what I'm doing wrong with concurrentupdatesolrserver thanks
Re: Regex to match one of two words
Why use regular expressions at all? Try: published OR deprecated -- Jack Krupansky -Original Message- From: Dinusha Dilrukshi Sent: Tuesday, October 08, 2013 3:32 AM To: solr-user@lucene.apache.org Subject: Regex to match one of two words I have an input that can have only 2 values Published or Deprecated. What regular expression can I use to ensure that either of the two words was submitted? I tried with different regular expressions (as in the [1], [2]) that contains most generic syntax.. But Solar throws parser exception when validating these expressions.. Could someone help me on writing this regular expression that will evaluate by the Solar parser. [1] /^(PUBLISHED)?(DEPRECATED)?$/ [2] /(PUBLISHED)?(DEPRECATED)?/ SolrCore org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'overview_status_s:/(PUBLISHED)?(DEPRECATED)?/': '*' or '?' not allowed as first character in WildcardQuery at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108) Regards, Dinusha.
Re: {soft}Commit and cache flusing
Tim, I suggest you open a new thread and not reply to this one to get noticed. Dmitry On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.comwrote: Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote: right. We've got the autoHard commit configured only atm. The soft-commits are controlled on the client. It was just easier to implement the first version of our internal commit policy that will commit to all solr instances at once. This is where we have noticed the reported behavior. On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.
Re: Fix sort order within an index ?
@Upayavira: q=topic:x1rows=20sort=id desc Or q=topic:x1rows=20sort=timestamp desc Will get you what you ask for. yeah I know that I could use SORT that will work but I asked just for an optimized way. Also that ticket has been fixed, so shouldn't be able to now make use of the fixed sort order? On Tue, Oct 8, 2013 at 11:59 AM, Upayavira u...@odoko.co.uk wrote: On Mon, Oct 7, 2013, at 11:09 PM, user 01 wrote: Any way to store documents in a fixed sort order within the indexes of certain fields(either the arrival order or sorted by int ids, that also serve as my unique key), so that I could store them optimized for browsing lists of items ? The order for browsing is always fixed there are no further filter queries. Just I need to fetch the top 20 (most recently added) document with field value topic=x1 I came across this article a JIRA issue which encouraged me that something like this may be possible: http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html https://issues.apache.org/jira/browse/LUCENE-4752 That ticket is an optimisation. If your IDs are sequential, you can sort on them. Or you can add a timestamp field with a default of NOW, and sort on that. q=topic:x1rows=20sort=id desc Or q=topic:x1rows=20sort=timestamp desc Will get you what you ask for. The above ticket might just make it a little faster. Upayavira
Applying an AND search considering several document snippets as a single document
Hi there, this is my first message to this list :) In our application we have a document split in several pages. When the user searches for words in a document we want to bring all documents containing all the words but we'd like to add a link to the specific page for each highlighting. Currently, I could think of some solution like indexing both the full documents and the pages and do this using two steps (conceptually, as I haven't actually implemented this): - perform an AND search across the full documents only and retrieve the document ids - perform an OR search across the pages index only for those pages belonging to the previously returned document ids so that I could build the link to the specific returned pages. But while the AND search is already a bit slow here, I'd like to avoid two Solr queries if possible, as I already need another RDBMS query as well and all of that sum up. Is there any way I could tell Solr to consider all indexed documents with an specified attribute as a single document for an AND matching purpose? Thanks in advance, Rodrigo.
Hardware dimension for new SolrCloud cluster
We're in the process of moving onto SolrCloud, and have gotten to the point where we are considering how to do our hardware setup. We're limited to VMs running on our server cluster and storage system, so buying new physical servers is out of the question - the question is how we should dimension the new VMs. Our document area is somewhat small, with about 1.2 million orders (rising of course), 75k products (divided into 5 countries - each which will be their own collection/core) and some million customers. In our current master/slave setup, we only index the products, with each country taking up about 35 MB of disk space. The index frequency i more or less updating the indexes 8 times per hour (mostly this is not all data thought, but atomic updates with new stock data, new prices etc.). Our upcoming order and customer indexes however will more or less receive updates on the fly as it happens (softcommit) and we expect the same to be the case for products in the near future. - For hardware, it's down to 1 or 2 cores - current master runs with 2 cores - RAM - currently our master runs with 6 GB only - How much heap space should we allocate for max heap? We currently plan on this setup: - 1 machine for a simple loadbalancer - 4 VMs totally for the Solr machines themselves (for both leaders and replicas, just one replica per shard is enough for our use case) - A qorum of 3 ZKs Question is - is this machine setup enough? And how exactly do we dimension the Solr machines? Any help, pointers or resources will be much appreciated :) Thank you!
Re: Hardware dimension for new SolrCloud cluster
I think Mr. Erickson summarized the issue of hardware sizing quite well in the following article: http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best regards, Primož From: Henrik Ossipoff Hansen h...@entertainment-trading.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: 08.10.2013 14:59 Subject:Hardware dimension for new SolrCloud cluster We're in the process of moving onto SolrCloud, and have gotten to the point where we are considering how to do our hardware setup. We're limited to VMs running on our server cluster and storage system, so buying new physical servers is out of the question - the question is how we should dimension the new VMs. Our document area is somewhat small, with about 1.2 million orders (rising of course), 75k products (divided into 5 countries - each which will be their own collection/core) and some million customers. In our current master/slave setup, we only index the products, with each country taking up about 35 MB of disk space. The index frequency i more or less updating the indexes 8 times per hour (mostly this is not all data thought, but atomic updates with new stock data, new prices etc.). Our upcoming order and customer indexes however will more or less receive updates on the fly as it happens (softcommit) and we expect the same to be the case for products in the near future. - For hardware, it's down to 1 or 2 cores - current master runs with 2 cores - RAM - currently our master runs with 6 GB only - How much heap space should we allocate for max heap? We currently plan on this setup: - 1 machine for a simple loadbalancer - 4 VMs totally for the Solr machines themselves (for both leaders and replicas, just one replica per shard is enough for our use case) - A qorum of 3 ZKs Question is - is this machine setup enough? And how exactly do we dimension the Solr machines? Any help, pointers or resources will be much appreciated :) Thank you!
Re: SolrCloud shard splitting keeps failing
I was wrong in saying that we don't need to open a searcher, we do. I committed a fix in SOLR-5314 to use soft commits instead of hard commits. I also increased the read time out value. Both of these together will reduce the likelyhood of such a thing happening. https://issues.apache.org/jira/browse/SOLR-5314 On Tue, Oct 8, 2013 at 1:24 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Hi Kalle, The problem here is that certain actions are taking too long causing the split process to terminate in between. For example, a commit on the parent shard leader took 83 seconds in your case but the read timeout value is set to 60 seconds only. We actually do not need to open a searcher during this commit. I'll open an issue and attach a fix. Longer term we need to introduce asynchronous commands so that status can be reported in a better way. On Tue, Oct 8, 2013 at 12:13 PM, Kalle Aaltonen kalle.aalto...@zemanta.com wrote: I have a test system where I have a index of 15M documents in one shard that I would like to split in two. I've tried it four times now. I have a stand-alone zookeeper running on the same machine. The end result is that I have two new shards with state construction, and each has one replica which is down. Two of the attempts failed because of heapspace. Now the heap size is 24GB. I can't figure out from the logs what is going on. I've attached a log of the latest attempt. Any help would be much appreciated. - Kalle Aaltonen -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: problem with data import handler delta import due to use of multiple datasource
I am using 4.3. It is not related to bugs related to last_index_time. The problem is caused by the fact that the parent entity and child entity use different data source (different databases on different hosts). From the log output, I do see the the delta query of the child entity being executed correctly and found all the rows that have been modified for the child entity. But it fails when it executed the parentDeltaQuery because it is still using the database connection from the child entity (ie datasource ds2 in my example above). Is there a way to tell DIH to use a different datasource in the parentDeltaQuery? Bill On Sat, Oct 5, 2013 at 10:28 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Which version of Solr and what kind of SQL errors? There were some bugs in 4.x related to last_index_time, but it does not sound related. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Oct 6, 2013 at 8:51 AM, Bill Au bill.w...@gmail.com wrote: Here is my DIH config: dataConfig dataSource name=ds1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost1/dbname1 user=db_username1 password=db_password1/ dataSource name=ds2 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost2/dbname2 user=db_username2 password=db_password2/ document name=products entity name=item dataSource=ds1 query=select * from item field column=ID name=id / field column=NAME name=name / entity name=feature dataSource=ds2 query=select description from feature where item_id='${item.ID}' field name=features column=description / /entity /entity /document /dataConfig I am having trouble with delta import. I think it is because the main entity and the sub-entity use different data source. I have tried using both a delta query: deltaQuery=select id from item where id in (select item_id as id from feature where last_modified '${dih.last_index_time}') or last_modified gt; '${dih.last_index_time}' and a parentDeltaQuery: entity name=feature pk=ITEM_ID query=select DESCRIPTION as features from FEATURE where ITEM_ID='${item.ID}' deltaQuery=select ITEM_ID from FEATURE where last_modified '${dih.last_index_time}' parentDeltaQuery=select ID from item where ID=${feature.ITEM_ID}/ I ended up with an SQL error for both. Is there any way to make delta import work in my case? Bill
RE: problem with data import handler delta import due to use of multiple datasource
Bill, I do not believe there is any way to tell it to use a different datasource for the parent delta query. If you used this approach, would it solve your problem: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Bill Au [mailto:bill.w...@gmail.com] Sent: Tuesday, October 08, 2013 8:50 AM To: solr-user@lucene.apache.org Subject: Re: problem with data import handler delta import due to use of multiple datasource I am using 4.3. It is not related to bugs related to last_index_time. The problem is caused by the fact that the parent entity and child entity use different data source (different databases on different hosts). From the log output, I do see the the delta query of the child entity being executed correctly and found all the rows that have been modified for the child entity. But it fails when it executed the parentDeltaQuery because it is still using the database connection from the child entity (ie datasource ds2 in my example above). Is there a way to tell DIH to use a different datasource in the parentDeltaQuery? Bill On Sat, Oct 5, 2013 at 10:28 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Which version of Solr and what kind of SQL errors? There were some bugs in 4.x related to last_index_time, but it does not sound related. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Oct 6, 2013 at 8:51 AM, Bill Au bill.w...@gmail.com wrote: Here is my DIH config: dataConfig dataSource name=ds1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost1/dbname1 user=db_username1 password=db_password1/ dataSource name=ds2 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost2/dbname2 user=db_username2 password=db_password2/ document name=products entity name=item dataSource=ds1 query=select * from item field column=ID name=id / field column=NAME name=name / entity name=feature dataSource=ds2 query=select description from feature where item_id='${item.ID}' field name=features column=description / /entity /entity /document /dataConfig I am having trouble with delta import. I think it is because the main entity and the sub-entity use different data source. I have tried using both a delta query: deltaQuery=select id from item where id in (select item_id as id from feature where last_modified '${dih.last_index_time}') or last_modified gt; '${dih.last_index_time}' and a parentDeltaQuery: entity name=feature pk=ITEM_ID query=select DESCRIPTION as features from FEATURE where ITEM_ID='${item.ID}' deltaQuery=select ITEM_ID from FEATURE where last_modified '${dih.last_index_time}' parentDeltaQuery=select ID from item where ID=${feature.ITEM_ID}/ I ended up with an SQL error for both. Is there any way to make delta import work in my case? Bill
Effect of multiple white space at WhiteSpaceTokenizer
I use Solr 4.5 and I have a WhiteSpaceTokenizer at my schema. What is the difference (index size and performance) for that two sentences: First one: This is a sentence. Second one: This is a sentence.
RE: How to achieve distributed spelling check in SolrCloud ?
Shamik, Are you using a request handler other than /select, and if so, did you set shards.qt in your request? It should be set to the name of the request handler you are using. See http://wiki.apache.org/solr/SpellCheckComponent?#Distributed_Search_Support James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Shamik Bandopadhyay [mailto:sham...@gmail.com] Sent: Monday, October 07, 2013 4:47 PM To: solr-user@lucene.apache.org Subject: How to achieve distributed spelling check in SolrCloud ? Hi, We are in the process of transitioning to SolrCloud (4.4) from Master-Slave architecture (4.2) . One of the issues I'm facing now is with making spell check work. It only seems to work if I explicitly set distrib=false. I'm using a custom request handler and included the spell check option. str name=spellcheckon/str str name=spellcheck.collatetrue/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str str name=spellcheck.dictionarydefault/str /lst !-- append spellchecking to our list of components -- arr name=last-components strspellcheck/str /arr The spellcheck component has the usual configuration. The spell check is part of the request handler which is being used to executed a distributed query.. I can't possibly add distrib=false. Just wondering if there's a way to address this. Any pointers will be appreciated. -Thanks, Shamik
RE: Effect of multiple white space at WhiteSpaceTokenizer
Result is the same and performance difference should be negligible, unless you're uploading megabytes of white space. Consecutive white space should be collapsed outside of Solr/Lucene anyway because it'll end up in your stored field. Index size will be slightly bigger but not much due to compression. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Tuesday 8th October 2013 16:21 To: solr-user@lucene.apache.org Subject: Effect of multiple white space at WhiteSpaceTokenizer I use Solr 4.5 and I have a WhiteSpaceTokenizer at my schema. What is the difference (index size and performance) for that two sentences: First one: This is a sentence. Second one: This is a sentence.
Adding FuctionalitiesIN SOLR
*1. Span NOT Operator* We have a business use case to use SPAN NOT queries in SOLR. Query Parser of LUCENE currently doesn't support/parse SPAN NOT queries. 2.Adding Recursive and Range Proximity *Recursive Proximity *is a proximity query within a proximity query Ex: “ “income tax”~5 statement” ~4 The recursion can be up to any level. * Range Proximity*: Currently we can only define number as a range we want interval as a range . Ex: “profit income”~3,5, “United America”~-5,4 3. Complex Queries A complex query is a query formed with a combination of Boolean operators or proximity queries or range queries or any possible combination of these. Ex:“(income AND tax) statement”~4 “ “income tax”~4 (statement OR period) ”~3 (“ income” SPAN NOT “income tax” ) source ~3,5 Can anyone suggest us some way of achieving these 3 functionalities in SOLR ???
Re: Regex to match one of two words
Or a boolean field for published, with false meaning deprecated. wunder On Oct 8, 2013, at 3:42 AM, Jack Krupansky wrote: Why use regular expressions at all? Try: published OR deprecated -- Jack Krupansky -Original Message- From: Dinusha Dilrukshi Sent: Tuesday, October 08, 2013 3:32 AM To: solr-user@lucene.apache.org Subject: Regex to match one of two words I have an input that can have only 2 values Published or Deprecated. What regular expression can I use to ensure that either of the two words was submitted? I tried with different regular expressions (as in the [1], [2]) that contains most generic syntax.. But Solar throws parser exception when validating these expressions.. Could someone help me on writing this regular expression that will evaluate by the Solar parser. [1] /^(PUBLISHED)?(DEPRECATED)?$/ [2] /(PUBLISHED)?(DEPRECATED)?/ SolrCore org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'overview_status_s:/(PUBLISHED)?(DEPRECATED)?/': '*' or '?' not allowed as first character in WildcardQuery at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108) Regards, Dinusha. -- Walter Underwood wun...@wunderwood.org
Re: ALIAS feature, can be used for what?
CREATEALIAS is also used to move an alias. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Fri, Oct 4, 2013 at 5:41 AM, Jan Høydahl jan@cominvent.com wrote: Hi, I have been asked the same question. There are only DELETEALIAS and CREATEALIAS actions available, so is there a way to achieve uninterrupted switch of an alias from one index to another? Are we lacking a MOVEALIAS command? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 27. sep. 2013 kl. 10:46 skrev Yago Riveiro yago.rive...@gmail.com: I need delete the alias for the old collection before point it to the new, right? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, September 27, 2013 at 2:25 AM, Otis Gospodnetic wrote: Hi, Imagine you have an index and you need to reindex your data into a new index, but don't want to have to reconfigure or restart client apps when you want to point them to the new index. This is where aliases come in handy. If you created an alias for the first index and made your apps hit that alias, then you can just repoint the same alias to your new index and avoid having to touch client apps. No, I don't think you can write to multiple collections through a single alias. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Thu, Sep 26, 2013 at 6:34 AM, yriveiro yago.rive...@gmail.com(mailto: yago.rive...@gmail.com) wrote: Today I was thinking about the ALIAS feature and the utility on Solr. Can anyone explain me with an example where this feature may be useful? It's possible have an ALIAS of multiples collections, if I do a write to the alias, Is this write replied to all collections? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/ALIAS-feature-can-be-used-for-what-tp4092095.html Sent from the Solr - User mailing list archive at Nabble.com ( http://Nabble.com).
Re: ALIAS feature, can be used for what?
You can index to an alias that points at only one collection. Works fine! Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Fri, Oct 4, 2013 at 7:59 AM, Upayavira u...@odoko.co.uk wrote: I've used this feature to great effect. I have logs coming in, and I create a core for each day. At the end of each day, I create a new core for tomorrow, unload any cores over 2 months old, then create a set of aliases (all, month, week, today) pointing to just the cores that are needed for that range. Thus, my app can efficiently query the bit of the index it is really interested in. You cannot, as far as I am aware, index directly to an alias. It wouldn't know what to do with the content. However, you can create an alias over the top of an existing one, and it will replace it. Works nicely. Upayavira On Fri, Oct 4, 2013, at 10:41 AM, Jan Høydahl wrote: Hi, I have been asked the same question. There are only DELETEALIAS and CREATEALIAS actions available, so is there a way to achieve uninterrupted switch of an alias from one index to another? Are we lacking a MOVEALIAS command? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 27. sep. 2013 kl. 10:46 skrev Yago Riveiro yago.rive...@gmail.com: I need delete the alias for the old collection before point it to the new, right? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, September 27, 2013 at 2:25 AM, Otis Gospodnetic wrote: Hi, Imagine you have an index and you need to reindex your data into a new index, but don't want to have to reconfigure or restart client apps when you want to point them to the new index. This is where aliases come in handy. If you created an alias for the first index and made your apps hit that alias, then you can just repoint the same alias to your new index and avoid having to touch client apps. No, I don't think you can write to multiple collections through a single alias. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Thu, Sep 26, 2013 at 6:34 AM, yriveiro yago.rive...@gmail.com(mailto: yago.rive...@gmail.com) wrote: Today I was thinking about the ALIAS feature and the utility on Solr. Can anyone explain me with an example where this feature may be useful? It's possible have an ALIAS of multiples collections, if I do a write to the alias, Is this write replied to all collections? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/ALIAS-feature-can-be-used-for-what-tp4092095.html Sent from the Solr - User mailing list archive at Nabble.com ( http://Nabble.com).
Bootstrapping / Full Importing using Solr Cloud
We are in the process of upgrading our Solr cluster to the latest and greatest Solr Cloud. I have some questions regarding full indexing though. We're currently running a long job (~30 hours) using DIH to do a full index on over 10M products. This process consumes a lot of memory and while updating can not handle any user requests. How, or what would be the best way going about this when using Solr Cloud? First off, does DIH work with cloud? Would I need to separate out my DIH indexing machine from the machines serving up user requests? If not going down the DIH route, what are my best options (solrj?) Thanks for the input
Case insensitive suggestion - Suggester with external dictionary
I am using suggester that uses external dictionary file for suggestions (as below). # This is a sample dictionary file. iPhone3g iPhone4 295 iPhone5c620 iPhone4g710 Everything works fine except for the fact that the suggester seems to be case sensitive. /suggest?q=ip is not matching any of the entries in the dictionary (listed above). Is there a way to make the suggester case insensitive when using external dictionary file? Thanks for your help!! -- View this message in context: http://lucene.472066.n3.nabble.com/Case-insensitive-suggestion-Suggester-with-external-dictionary-tp4094133.html Sent from the Solr - User mailing list archive at Nabble.com.
EdgeNGramFilterFactory and Faceting
Hey Everyone, When faceting on a field using the EdgeNGramFilterFactory the returned facets values include all of the n-gram values. Is there a way to limit this list to the stored values without creating a new field? Thanks in advance! Tyler
RE: EdgeNGramFilterFactory and Faceting
Facets do not return the stored constraints, it's usually bad idea to tokenize or do some have analysis on facet fields. You need to copy your field instead. -Original message- From:Tyler Foster tfos...@cloudera.com Sent: Tuesday 8th October 2013 19:28 To: solr-user@lucene.apache.org Subject: EdgeNGramFilterFactory and Faceting Hey Everyone, When faceting on a field using the EdgeNGramFilterFactory the returned facets values include all of the n-gram values. Is there a way to limit this list to the stored values without creating a new field? Thanks in advance! Tyler
Re: EdgeNGramFilterFactory and Faceting
Tyler, faceting works on indexed content and not stored content. On Tue, Oct 8, 2013 at 10:45 PM, Tyler Foster tfos...@cloudera.com wrote: Hey Everyone, When faceting on a field using the EdgeNGramFilterFactory the returned facets values include all of the n-gram values. Is there a way to limit this list to the stored values without creating a new field? Thanks in advance! Tyler -- Regards, Shalin Shekhar Mangar.
Re: EdgeNGramFilterFactory and Faceting
Thanks, that was the way it was looking. I just wanted to make sure I wasn't missing something. On Tue, Oct 8, 2013 at 10:32 AM, Markus Jelsma markus.jel...@openindex.iowrote: Facets do not return the stored constraints, it's usually bad idea to tokenize or do some have analysis on facet fields. You need to copy your field instead. -Original message- From:Tyler Foster tfos...@cloudera.com Sent: Tuesday 8th October 2013 19:28 To: solr-user@lucene.apache.org Subject: EdgeNGramFilterFactory and Faceting Hey Everyone, When faceting on a field using the EdgeNGramFilterFactory the returned facets values include all of the n-gram values. Is there a way to limit this list to the stored values without creating a new field? Thanks in advance! Tyler
RE: How to achieve distributed spelling check in SolrCloud ?
James, Thanks for your reply. The shards.qt did the trick. I read the documentation earlier but was not clear on the implementation, now it totally makes sense. Appreciate your help. Regards, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/RE-How-to-achieve-distributed-spelling-check-in-SolrCloud-tp4094113p4094137.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.4.0 Shard Update Errors (503) but cloud graph shows all green?
Hi!We are running Solr 4.4.0 on a 3 node linux cluster and have about 2 collections storing product data with no problems. Yesterday, I attempted to create another one of these collections using the Collections API, but I had forgotten to upload the config to the zookeeper prior to making the call and it failed spectacularly as expected :).. The API command I ran was to create a 3 shard collection with a replicationfactor of 2 (maxShardsPerNode) set to 2 since the default understandably causes issues on 3 node clusters.Since I ran that command however, I see the following message in the red 'SolrCore Initialization Failures' when I load up the admin for 2 out of 3 of the nodes (the following is from one of the boxes):MyNewCollection_shard1_replica2: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection MyNewCollection found:[MyFirstCollection, MySecondCollection]MyNewCollection_shard3_replica1: org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException: Could not find configName for collection MyNewCollection found:[MyFirstCollection, MySecondCollection]My first question is, how do I get this to go away since the cores never actually got created? I looked in the solr directory and I do not see folders with the core names (which I'm under the impression that the implicit core walking uses to determine what cores to attempt to load).Second, and a bit stranger, is that also since I messed up that command, I now appear to be seeing errors from the admin log (every 2 seconds) when attempting to update documents in the other 2 collections that were working fine prior to the command being run. Specifically, I'm seeing these messages repeating over and over near constantly:14:07:11ERRORSolrCmdDistributorshard update error StdNode: http://10.0.1.29:8983/solr/MyFirstCollection_shard1_replica2/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://10.0.1.29:8983/solr/MyFirstCollection_shard1_replica2 returned non ok status:503, message:Service Unavailable14:07:11ERRORSolrCoreRequest says it is coming from leader, but we are the leader: distrib.from=http://10.0.1.30:8983/solr/MyFirstCollection_shard1_replica1/update.distrib=FROMLEADERwt=javabinversion=214:07:11ERRORSolrCoreorg.apache.solr.common.SolrException: Request says it is coming from leader, but we are the leader14:07:11WARNRecoveryStrategyStopping recovery for zkNodeName=core_node1core=MyFirstCollection_shard1_replica214:07:11WARNRecoveryStrategyWe have not yet recovered - but we are now the leader! core=MyFirstCollection_shard1_replica2The first error worries me much, as I think I'm losing data, but I can directly query that shard from that machine with no issues and the cloud view from ALL of the machines shows totally green.I'm not sure how the failed command got the system into this state and I'm kicking myself for making that mistake to begin with but I'm completely at a loss for how to attempt to recover since these are live collections that I can't take down without incurring significant downtime.Any ideas? Will reloading the cores that are throwing these messages help? can the zookeeper and solr not have the same idea as to who the leader is for that shard? and if so, how do I re-introduce consistency there?Appreciate any help that can be offered.Thanks,--Dave -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-4-0-Shard-Update-Errors-503-but-cloud-graph-shows-all-green-tp4094139.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to achieve distributed spelling check in SolrCloud ?
The shards.qt parameter is the easiest one to forget, with the most dramatic of consequences! On Oct 8, 2013, at 11:10 AM, shamik sham...@gmail.com wrote: James, Thanks for your reply. The shards.qt did the trick. I read the documentation earlier but was not clear on the implementation, now it totally makes sense. Appreciate your help. Regards, Shamik -- View this message in context: http://lucene.472066.n3.nabble.com/RE-How-to-achieve-distributed-spelling-check-in-SolrCloud-tp4094113p4094137.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID
Hi, We have recently migrated from Solr 3.6 to Solr 4.4. We are using the Master/Slave configuration in Solr 4.4 (not Solr Cloud). We have noticed the following behavior/defect. Configuration: === 1. The Hard Commit and Soft Commit are disabled in the configuration (we control the commits from the application) 2. We have 1 Master and 2 Slaves configured and the pollInterval is configured to 10 Minutes. 3. The Master is configured to have the replicateAfter as commit startup Steps to reproduce the problem: == 1. Delete a document in Solr (using delete by id). URL - http://localhost:8983/solr/annotation/update with body as deleteidchange.me/id/delete 2. Issue a commit in Master (http://localhost:8983/solr/annotation/update?commit=true). 3. The replication of the DELETE WILL NOT happen. The master and slave has the same Index version. 4. If we try to issue another commit in Master, we see that it replicates fine. Request you to please confirm if this is a known issue. Thank you. Regards, Bharat Akkinepalli
Re: problem with data import handler delta import due to use of multiple datasource
Thanks for the suggestion but that won't work as I have last_modified field in both the parent entity and child entity as I want delta import to kick in when either change. That other approach has the same problem since the parent and child entity uses different datasources. Bill On Tue, Oct 8, 2013 at 10:18 AM, Dyer, James james.d...@ingramcontent.comwrote: Bill, I do not believe there is any way to tell it to use a different datasource for the parent delta query. If you used this approach, would it solve your problem: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport ? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Bill Au [mailto:bill.w...@gmail.com] Sent: Tuesday, October 08, 2013 8:50 AM To: solr-user@lucene.apache.org Subject: Re: problem with data import handler delta import due to use of multiple datasource I am using 4.3. It is not related to bugs related to last_index_time. The problem is caused by the fact that the parent entity and child entity use different data source (different databases on different hosts). From the log output, I do see the the delta query of the child entity being executed correctly and found all the rows that have been modified for the child entity. But it fails when it executed the parentDeltaQuery because it is still using the database connection from the child entity (ie datasource ds2 in my example above). Is there a way to tell DIH to use a different datasource in the parentDeltaQuery? Bill On Sat, Oct 5, 2013 at 10:28 PM, Alexandre Rafalovitch arafa...@gmail.comwrote: Which version of Solr and what kind of SQL errors? There were some bugs in 4.x related to last_index_time, but it does not sound related. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Oct 6, 2013 at 8:51 AM, Bill Au bill.w...@gmail.com wrote: Here is my DIH config: dataConfig dataSource name=ds1 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost1/dbname1 user=db_username1 password=db_password1/ dataSource name=ds2 type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost2/dbname2 user=db_username2 password=db_password2/ document name=products entity name=item dataSource=ds1 query=select * from item field column=ID name=id / field column=NAME name=name / entity name=feature dataSource=ds2 query=select description from feature where item_id='${item.ID}' field name=features column=description / /entity /entity /document /dataConfig I am having trouble with delta import. I think it is because the main entity and the sub-entity use different data source. I have tried using both a delta query: deltaQuery=select id from item where id in (select item_id as id from feature where last_modified '${dih.last_index_time}') or last_modified gt; '${dih.last_index_time}' and a parentDeltaQuery: entity name=feature pk=ITEM_ID query=select DESCRIPTION as features from FEATURE where ITEM_ID='${item.ID}' deltaQuery=select ITEM_ID from FEATURE where last_modified '${dih.last_index_time}' parentDeltaQuery=select ID from item where ID=${feature.ITEM_ID}/ I ended up with an SQL error for both. Is there any way to make delta import work in my case? Bill
Re: What is the full list of Solr Special Characters?
On 10/8/2013 3:01 AM, Furkan KAMACI wrote: Actually I want to remove special characters and wont send them into my Solr indexes. I mean user can send a special query as like a SQL injection and I want to prevent my system such kind of scenarios. There is a newer javadoc than the *very* old one you are looking at: http://lucene.apache.org/core/4_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html?is-external=true#Escaping_Special_Characters When I compare that list to what's actually in the SolrJ escapeQueryChars method, it looks like that method does one additional character - the semicolon. http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_5_0/solr/solrj/src/java/org/apache/solr/client/solrj/util/ClientUtils.java Just search the page for escapeQueryChars to see the java code. Thanks, Shawn
run filter queries after post filter
Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
Re: no such field error:smaller big block size details while indexing doc files
This my new schema.xml: schema name=documents fields field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=string indexed=false stored=true multiValued=true/ dynamicField name=* type=ignored multiValued=true / copyfield source=id dest=text / copyfield source=author dest=text / /fields types fieldtype name=ignored stored=false indexed=false class=solr.StrField / fieldType name=integer class=solr.IntField / fieldType name=long class=solr.LongField / fieldType name=string class=solr.StrField / fieldType name=text class=solr.TextField / /types uniqueKeyid/uniqueKey /schema I still get the same error. From: Erick Erickson [via Lucene] ml-node+s472066n4094013...@n3.nabble.com To: sweety sweetyshind...@yahoo.com Sent: Tuesday, October 8, 2013 7:16 AM Subject: Re: no such field error:smaller big block size details while indexing doc files Well, one of the attributes parsed out of, probably the meta-information associated with one of your structured docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and Solr Cel is faithfully sending that to your index. If you want to throw all these in the bit bucket, try defining a true catch-all field that ignores things, like this. dynamicField name=* type=ignored multiValued=true / Best, Erick On Mon, Oct 7, 2013 at 8:03 AM, sweety [hidden email] wrote: Im trying to index .doc,.docx,pdf files, im using this url: curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@complex.doc This is the error I get: Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at
Re: solr cpu usage
Yes, you've saved us all lots of time with this article. I'm about to do the same for the old Jetty or Tomcat? container question ;). Tim On 7 October 2013 18:55, Erick Erickson erickerick...@gmail.com wrote: Tim: Thanks! Mostly I wrote it to have something official looking to hide behind when I didn't have a good answer to the hardware sizing question :). On Mon, Oct 7, 2013 at 2:48 PM, Tim Vaillancourt t...@elementspace.com wrote: Fantastic article! Tim On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote: From my perspective, your question is almost impossible to answer, there are too many variables. See: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, More CPU cores means more concurrency. This is good if you need to handle high query rates. Faster cores mean lower query latency, assuming you are not bottlenecked by memory or disk IO or network IO. So what is ideal for you depends on your concurrency and latency needs. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote: hi We're building a spec for a machine to purchase. We're going to buy 10 machines. we aren't sure yet how many proccesses we will run per machine. the question is -should we buy faster cpu with less cores or slower cpu with more cores? in any case we will have 2 cpus in each machine. should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores? what will we gain by having many cores? what kinds of usages would make cpu be the bottleneck? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html Sent from the Solr - User mailing list archive at Nabble.com.
dynamically adding core with auto-discovery in Solr 4.5
Hi, We are using auto discovery and have a use case where we want to be able to add cores dynamically, without restarting solr. In 4.4 we were able to - add a directory (e.g. core1) with an empty core.properties - call http://localhost:8983/solr/admin/cores?action=CREATEcore=core1name=core1instanceDir=%2Fsomewhere%2Fcore1 In 4.5 however this (the second step) fails, saying it cannot create a new core in that directory because another core is already defined there. From the documentation (http://wiki.apache.org/solr/CoreAdmin), I understand that since 4.3 we should actually do RELOAD. However, RELOAD results in this stacktrace: org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:673) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:172) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:322) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: org.apache.solr.common.SolrException: Unable to reload core: core1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:936) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:691) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:671) ... 20 more Caused by: org.apache.solr.common.SolrException: No such core: core1 at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:642) ... 21 more Note that before I RELOAD, the core1 directory was created. Also note that next to the core1 directory, there is a core0 directory which has exactly the same content and is auto-discovered perfectly fine at startup. So... what should it be? Or am I missing something here? thanks in advance, Jan
Re: Accent insensitive multi-words suggester
Thank you Erick. I will try this. Regards Dominique Le 06/10/13 03:03, Erick Erickson a écrit : Consider implementing a special field that of the form accentfolded|original For instance, you'd index something like ecole|école ecole|école privée as _terms_, not broken up at all. Now, when you send something to the suggester you send just eco or éco you fold them to eco too and get back these tokens. Then the app layer breaks them up and displays them pleasingly. Best Erick On Tue, Oct 1, 2013 at 5:45 PM, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi, Up to now, the best solution I found in order to implement a multi-words suggester was to use ShingleFilterFactory filter at index time and the termsComponent. At index time the analyzer was : analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.ElisionFilterFactory ignoreCase=true articles=lang/contractions_fr.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory / filter class=solr.ShingleFilterFactory maxShingleSize=4 outputUnigrams=true/ /analyzer With ASCIIFoldingFilter filter, it works find if the user do not use accent in query terms and all suggestions are without accents. Without ASCIIFoldingFilter filter, it works find if the user do not forget accent in query terms and all suggestions are with accents. Note : I use the StopFilter to avoid suggestions including stop words and particularly starting or ending with stop words. What I need is a suggester where the user can use or not use the accent in query terms and the suggestions are returned with accent. For example, if the user type éco or eco, the suggester should return : école école primaire école publique école privée école primaire privée I think it is impossible to achieve this with the termComponents and I should use the SpellCheckComponent instead. However, I don't see how to make the suggester accent insensitive and return the suggestions with accents. Did somebody already achieved that ? Thank you. Dominique -- Dominique Béjean +33 6 08 46 12 43 skype: dbejean www.eolya.fr www.crawl-anywhere.com
Re: {soft}Commit and cache flusing
I have a genuine question with substance here. If anything this nonconstructive, rude response was to get noticed. Thanks for contributing to the discussion. Tim On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote: Tim, I suggest you open a new thread and not reply to this one to get noticed. Dmitry On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com wrote: Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote: right. We've got the autoHard commit configured only atm. The soft-commits are controlled on the client. It was just easier to implement the first version of our internal commit policy that will commit to all solr instances at once. This is where we have noticed the reported behavior. On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.
What's the purpose of the bits option in compositeId (Solr 4.5)?
I'm curious what the later shard-local bits do, if anything? I have a very large cluster (256 shards) and I'm sending most of my data with a single composite, e.g. 1234!unique_id, but I'm noticing the data is being split among many of the shards. My guess right now is that since I'm only using the default 16 bits my data is being split across multiple shards (because of my high # of shards). Thanks, Brett
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
On Tue, Oct 8, 2013 at 6:29 PM, Brett Hoerner br...@bretthoerner.com wrote: I'm curious what the later shard-local bits do, if anything? I have a very large cluster (256 shards) and I'm sending most of my data with a single composite, e.g. 1234!unique_id, but I'm noticing the data is being split among many of the shards. That shouldn't be the case. All of your shards should have a lower hash value with all 0 bits and an upper hash value of all 1s (i.e. 0x to 0x) So you see any shards where that's not true? Also, is the router set to compositeId? -Yonik My guess right now is that since I'm only using the default 16 bits my data is being split across multiple shards (because of my high # of shards). Thanks, Brett
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
Router is definitely compositeId. To be clear, data isn't being spread evenly... it's like it's *almost* working. It's just odd to me that I'm slamming in data that's 99% of one _route_ key yet after a few minutes (from a fresh empty index) I have 2 shards with a sizeable amount of data (68M and 128M) and the rest are very small as expected. The fact that two are receiving so much makes me think my data is being split into two shards. I'm trying to debug more now. On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Oct 8, 2013 at 6:29 PM, Brett Hoerner br...@bretthoerner.com wrote: I'm curious what the later shard-local bits do, if anything? I have a very large cluster (256 shards) and I'm sending most of my data with a single composite, e.g. 1234!unique_id, but I'm noticing the data is being split among many of the shards. That shouldn't be the case. All of your shards should have a lower hash value with all 0 bits and an upper hash value of all 1s (i.e. 0x to 0x) So you see any shards where that's not true? Also, is the router set to compositeId? -Yonik My guess right now is that since I'm only using the default 16 bits my data is being split across multiple shards (because of my high # of shards). Thanks, Brett
limiting deep pagination
Is there a way to configure Solr 'defaults/appends/invariants' such that the product of the 'start' and 'rows' parameters doesn't exceed a given value? This would be to prevent deep pagination. Or would this require a custom requestHandler? Peter
dynamic field question
I am having trouble trying to return a particular dynamic field only instead of all dynamic fields. Imagine I have a document with an unknown number of sections. Each section can have a 'title' and a 'body' I have each section title and body as dynamic fields such as section_title_* and section_body_* Imagine that some documents contain a section that has a title=Appendix I want a query that will find all docs with that section and return just the Appendix section. I don't know how to return just that one section though I can copyField my dynamic field section_title_* into a static field called section_titles and query that for docs that contain the Appendix But I don't know how to only return that one dynamic field ?q=section_titles:Appendixfl=section_body_* Any ideas? I can't seem to put a conditional in the fl parameter
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
This is my clusterstate.json: https://gist.github.com/bretthoerner/0098f741f48f9bb51433 And these are my core sizes (note large ones are sorted to the end): https://gist.github.com/bretthoerner/f5b5e099212194b5dff6 I've only heavily sent 2 shards by now (I'm sharding by hour and it's been running for 2). There *is* a little old data in my stream, but not that much (like 5%). What's confusing to me is that 5 of them are rather large, when I'd expect 2 of them to be. On Tue, Oct 8, 2013 at 5:45 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Oct 8, 2013 at 6:29 PM, Brett Hoerner br...@bretthoerner.com wrote: I'm curious what the later shard-local bits do, if anything? I have a very large cluster (256 shards) and I'm sending most of my data with a single composite, e.g. 1234!unique_id, but I'm noticing the data is being split among many of the shards. That shouldn't be the case. All of your shards should have a lower hash value with all 0 bits and an upper hash value of all 1s (i.e. 0x to 0x) So you see any shards where that's not true? Also, is the router set to compositeId? -Yonik My guess right now is that since I'm only using the default 16 bits my data is being split across multiple shards (because of my high # of shards). Thanks, Brett
Re: limiting deep pagination
I don't know of any OOTB way to do that, I'd write a custom request handler as you suggested. Tomás On Tue, Oct 8, 2013 at 3:51 PM, Peter Keegan peterlkee...@gmail.com wrote: Is there a way to configure Solr 'defaults/appends/invariants' such that the product of the 'start' and 'rows' parameters doesn't exceed a given value? This would be to prevent deep pagination. Or would this require a custom requestHandler? Peter
Re: limiting deep pagination
I'd recommend a custom first-components SearchComponent. Then it could simply validate (or adjust) the parameters or throw an exception. Knowing Tomás - that's probably what he'd really do :) Erik On Oct 8, 2013, at 19:34, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: I don't know of any OOTB way to do that, I'd write a custom request handler as you suggested. Tomás On Tue, Oct 8, 2013 at 3:51 PM, Peter Keegan peterlkee...@gmail.com wrote: Is there a way to configure Solr 'defaults/appends/invariants' such that the product of the 'start' and 'rows' parameters doesn't exceed a given value? This would be to prevent deep pagination. Or would this require a custom requestHandler? Peter
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
On Tue, Oct 8, 2013 at 7:31 PM, Brett Hoerner br...@bretthoerner.com wrote: This is my clusterstate.json: https://gist.github.com/bretthoerner/0098f741f48f9bb51433 And these are my core sizes (note large ones are sorted to the end): https://gist.github.com/bretthoerner/f5b5e099212194b5dff6 I've only heavily sent 2 shards by now (I'm sharding by hour and it's been running for 2). There *is* a little old data in my stream, but not that much (like 5%). What's confusing to me is that 5 of them are rather large, when I'd expect 2 of them to be. The cluster state looks fine at first glance... and each route key should map to a single shard. You could try a query to each of the big shards and see what IDs are in them. -Yonik
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
I have a silly question, how do I query a single shard in SolrCloud? When I hit solr/foo_shard1_replica1/select it always seems to do a full cluster query. I can't (easily) do a _route_ query before I know what each have. On Tue, Oct 8, 2013 at 7:06 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Oct 8, 2013 at 7:31 PM, Brett Hoerner br...@bretthoerner.com wrote: This is my clusterstate.json: https://gist.github.com/bretthoerner/0098f741f48f9bb51433 And these are my core sizes (note large ones are sorted to the end): https://gist.github.com/bretthoerner/f5b5e099212194b5dff6 I've only heavily sent 2 shards by now (I'm sharding by hour and it's been running for 2). There *is* a little old data in my stream, but not that much (like 5%). What's confusing to me is that 5 of them are rather large, when I'd expect 2 of them to be. The cluster state looks fine at first glance... and each route key should map to a single shard. You could try a query to each of the big shards and see what IDs are in them. -Yonik
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
Ignore me I forgot about shards= from the wiki. On Tue, Oct 8, 2013 at 7:11 PM, Brett Hoerner br...@bretthoerner.comwrote: I have a silly question, how do I query a single shard in SolrCloud? When I hit solr/foo_shard1_replica1/select it always seems to do a full cluster query. I can't (easily) do a _route_ query before I know what each have. On Tue, Oct 8, 2013 at 7:06 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Oct 8, 2013 at 7:31 PM, Brett Hoerner br...@bretthoerner.com wrote: This is my clusterstate.json: https://gist.github.com/bretthoerner/0098f741f48f9bb51433 And these are my core sizes (note large ones are sorted to the end): https://gist.github.com/bretthoerner/f5b5e099212194b5dff6 I've only heavily sent 2 shards by now (I'm sharding by hour and it's been running for 2). There *is* a little old data in my stream, but not that much (like 5%). What's confusing to me is that 5 of them are rather large, when I'd expect 2 of them to be. The cluster state looks fine at first glance... and each route key should map to a single shard. You could try a query to each of the big shards and see what IDs are in them. -Yonik
Re: Improving indexing performance
queue size shouldn't really be too large, the whole point of the concurrency is to keep from waiting around for the communication with the server in a single thread. So having a bunch of stuff backed up in the queue isn't buying you anything And you can always increase the memory allocated to the JVM running SolrJ... Erick On Tue, Oct 8, 2013 at 5:29 AM, Matteo Grolla matteo.gro...@gmail.com wrote: Thanks Erik, I think I have been able to exhaust a resource if I split the data in 2 and upload it with 2 clients like benchmark 1.1 it takes 120s here the bottleneck it my LAN, if I use a setting like benchmark 1 probably the bottleneck is the ramBuffer. I'm going to buy a Gigabit ethernet cable so I can make a better test. OutOfMemory error: it's the solrj client that crashes I'm using solr 4.2.1 and corresponding solrj client httpsolrserver works fine concurrentupdatesolrsever gives me problems, and I didn't understand how to size the queuesize parameter optimally Il giorno 07/ott/2013, alle ore 14:03, Erick Erickson ha scritto: Just skimmed, but the usual reason you can't max out the server is that the client can't go fast enough. Very quick experiment: comment out the server.add line in your client and run it again, does that speed up the client substantially? If not, then the time is being spent on the client. Or split your csv file into, say, 5 parts and run it from 5 different PCs in parallel. bq: I can't rely on auto commit, otherwise I get an OutOfMemory error This shouldn't be happening, I'd get to the bottom of this. Perhaps simply allocating more memory to the JVM running Solr. bq: committing every 100k docs gives worse performance It'll be best to specify openSearcher=false for max indexing throughput BTW. You should be able to do this quite frequently, 15 seconds seems quite reasonable. Best, Erick On Sun, Oct 6, 2013 at 12:19 PM, Matteo Grolla matteo.gro...@gmail.com wrote: I'd like to have some suggestion on how to improve the indexing performance on the following scenario I'm uploading 1M docs to solr, every docs has id: sequential number title: small string date: date body: 1kb of text Here are my benchmarks (they are all single executions, not averages from multiple executions): 1) using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document total time: 143035ms 1.1)using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document ramBufferSizeMB500/ramBufferSizeMB maxBufferedDocs10/maxBufferedDocs total time: 134493ms 1.2)using the updaterequesthandler and streaming docs from a csv file on the same disk of solr auto commit every 15s with openSearcher=false and commit after last document mergeFactor30/mergeFactor total time: 143134ms 2) using a solrj client from another pc in the lan (100Mbps) with httpsolrserver with javabin format add documents to the server in batches of 1k docs ( server.add( collection ) ) auto commit every 15s with openSearcher=false and commit after last document total time: 139022ms 3) using a solrj client from another pc in the lan (100Mbps) with concurrentupdatesolrserver with javelin format add documents to the server in batches of 1k docs ( server.add( collection ) ) server queue size=20k server threads=4 no auto-commit and commit every 100k docs total time: 167301ms --On the solr server-- cpu averages25% at best 100% for 1 core IO is still far from being saturated iostat gives a pattern like this (every 5 s) time(s) %util 100 45,20 105 1,68 110 17,44 115 76,32 120 2,64 125 68 130 1,28 I thought that using concurrentupdatesolrserver I was able to max cpu or IO but I wasn't. With concurrentupdatesolrserver I can't rely on auto commit, otherwise I get an OutOfMemory error and I found that committing every 100k docs gives worse performance than auto commit every 15s (benchmark 3 with httpsolrserver took 193515) I'd really like to understand why I can't max out the resources on the server hosting solr (disk above all) And I'd really like to understand what I'm doing wrong with concurrentupdatesolrserver thanks
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
On 10/8/2013 6:12 PM, Brett Hoerner wrote: Ignore me I forgot about shards= from the wiki. On Tue, Oct 8, 2013 at 7:11 PM, Brett Hoerner br...@bretthoerner.comwrote: I have a silly question, how do I query a single shard in SolrCloud? When I hit solr/foo_shard1_replica1/select it always seems to do a full cluster query. I can't (easily) do a _route_ query before I know what each have. There is also the distrib=false parameter that will cause the request to be handled directly by the core it is sent to rather than being distributed/balanced by SolrCloud. Thanks, Shawn
Re: Bootstrapping / Full Importing using Solr Cloud
DIH works with SolrCloud as far as I understand. But moving to SolrJ has several advantages: 1 you have more control over our process, beter ability to debug etc. 2 If you can partition your data up amongst several clients, you can probably get through your jobs much faster. 3 You're not overloading one machine with both the DIH bits and the indexing bits. There are some other options, I generally prefer SolrJ though. Others have different opinions of course. Best, Erick On Tue, Oct 8, 2013 at 12:57 PM, Mark static.void@gmail.com wrote: We are in the process of upgrading our Solr cluster to the latest and greatest Solr Cloud. I have some questions regarding full indexing though. We're currently running a long job (~30 hours) using DIH to do a full index on over 10M products. This process consumes a lot of memory and while updating can not handle any user requests. How, or what would be the best way going about this when using Solr Cloud? First off, does DIH work with cloud? Would I need to separate out my DIH indexing machine from the machines serving up user requests? If not going down the DIH route, what are my best options (solrj?) Thanks for the input
Re: run filter queries after post filter
Hmmm, seems like it should. What's our evidence that it isn't working? Best, Erick On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
Re: no such field error:smaller big block size details while indexing doc files
Hmmm, that is odd, the glob dynamicField should pick this up. Not quite sure what's going on. You an parse the file via Tika yourself and look at what's in there, it's a relatively simple SolrJ program, here's a sample: http://searchhub.org/2012/02/14/indexing-with-solrj/ Best, Erick On Tue, Oct 8, 2013 at 4:15 PM, sweety sweetyshind...@yahoo.com wrote: This my new schema.xml: schema name=documents fields field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=string indexed=false stored=true multiValued=true/ dynamicField name=* type=ignored multiValued=true / copyfield source=id dest=text / copyfield source=author dest=text / /fields types fieldtype name=ignored stored=false indexed=false class=solr.StrField / fieldType name=integer class=solr.IntField / fieldType name=long class=solr.LongField / fieldType name=string class=solr.StrField / fieldType name=text class=solr.TextField / /types uniqueKeyid/uniqueKey /schema I still get the same error. From: Erick Erickson [via Lucene] ml-node+s472066n4094013...@n3.nabble.com To: sweety sweetyshind...@yahoo.com Sent: Tuesday, October 8, 2013 7:16 AM Subject: Re: no such field error:smaller big block size details while indexing doc files Well, one of the attributes parsed out of, probably the meta-information associated with one of your structured docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and Solr Cel is faithfully sending that to your index. If you want to throw all these in the bit bucket, try defining a true catch-all field that ignores things, like this. dynamicField name=* type=ignored multiValued=true / Best, Erick On Mon, Oct 7, 2013 at 8:03 AM, sweety [hidden email] wrote: Im trying to index .doc,.docx,pdf files, im using this url: curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@complex.doc This is the error I get: Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165) at
Re: How to share Schema between multicore on Solr 4.4
On 10/7/2013 6:02 AM, Dharmendra Jaiswal wrote: I am using Solr 4.4 version with SolrCloud on Windows machine. Somehow i am not able to share schema between multiple core. If you're in SolrCloud mode, then you already *are* sharing your schema. You are also sharing your configuration. Both of them are in zookeeper. All collections (and all shards within a collection) which use a given config name are using the same copy. Any copies of your config/schema that might be on your disk are *NOT* being used. If you are starting Solr with any bootstrap options, then the config set that is in zookeeper might be getting overwritten by whats on your disk when Solr restarts, but otherwise SolrCloud *only* uses zookeeper for config/schema. The bootstrap options are meant to be used once, and I actually prefer to get SolrCloud operational without using bootstrap options at all. Thanks, Shawn
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey s...@elyograg.org wrote: There is also the distrib=false parameter that will cause the request to be handled directly by the core it is sent to rather than being distributed/balanced by SolrCloud. Right - this is probably the best option for diagnosing what is in what index. -Yonik
Re: How to warm up filter queries for a category field with 1000 possible values ?
On 10/7/2013 12:36 AM, user 01 wrote: what's the way to warm up filter queries for a category field with 1000 possible values. Would I need to write 1000 lines manually in the solrconig.xml or what is the format? Erick has given you awesome advice. Here's something a little bit different that doesn't invalidate his advice: If you have enough free RAM (not used by programs) for good OS disk caching, then as soon as you do one query that checks this field, then all 1000 values for that field are likely to be in RAM, and the next query against that field is going to be lightning fast, because the operating system will not have to read the disk to get the information. Although it is slightly faster to get informatin out of Solr's caches than the OS disk cache, the operating system is far better at managing huge caches than Solr and Java are. http://wiki.apache.org/solr/SolrPerformanceProblems#General_information Thanks, Shawn
Re: SolrJ best pratices
On 10/7/2013 3:08 PM, Mark wrote: Some specific questions: - When working with HttpSolrServer should we keep around instances for ever or should we create a singleton that can/should be used over and over? - Is there a way to change the collection after creating the server or do we need to create a new server for each collection? If at all possible, you should create your server object and use it for the life of your application. SolrJ is threadsafe. If there is any part of it that's not, the javadocs should say so - the SolrServer implementations definitely are. By using the word collection you are implying that you are using SolrCloud ... but earlier you said HttpSolrServer, which implies that you are NOT using SolrCloud. With HttpSolrServer, your base URL includes the core or collection name - http://server:port/solr/corename; for example. Generally you will need one object for each core/collection, and another object for server-level things like CoreAdmin. With SolrCloud, you should be using CloudSolrServer instead, another implementation of SolrServer that is constantly aware of the SolrCloud clusterstate. With that object, you can use setDefaultCollection, and you can also add a collection parameter to each SolrQuery or other request object. Thanks, Shawn
Re: SolrCloud High Availability during indexing operation
Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.comwrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.comwrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
stats on dynamic fields?
Hi, I don't seem to be able to find any info on the possibility to get stats on dynamic fields. stats=truestates.field=xyz_* appears to literally treat xyz_* as the field name with a star. Is there a way to get stats on dynamic fields without explicitly listing them in the query? Thanks! Li
Re: SolrCloud High Availability during indexing operation
The attachment did not go through - try using pastebin.com or something. Are you adding docs with curl one at a time or in bulk per request. - Mark On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com wrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: ALIAS feature, can be used for what?
Right - update aliases should only map an alias to one collection, but are perfectly valid. Read aliases can map to multiple collections or just one. There is currently only a create alias command and not an update alias command. I suppose because the impl for create just happened to work for update as well, so I guess I figured why add it explicitly. I figured we could still do it later - and I suppose we probably should. I also intend to add a list alias command: https://issues.apache.org/jira/browse/SOLR-4968 - Mark On Oct 8, 2013, at 11:31 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: You can index to an alias that points at only one collection. Works fine! Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Fri, Oct 4, 2013 at 7:59 AM, Upayavira u...@odoko.co.uk wrote: I've used this feature to great effect. I have logs coming in, and I create a core for each day. At the end of each day, I create a new core for tomorrow, unload any cores over 2 months old, then create a set of aliases (all, month, week, today) pointing to just the cores that are needed for that range. Thus, my app can efficiently query the bit of the index it is really interested in. You cannot, as far as I am aware, index directly to an alias. It wouldn't know what to do with the content. However, you can create an alias over the top of an existing one, and it will replace it. Works nicely. Upayavira On Fri, Oct 4, 2013, at 10:41 AM, Jan Høydahl wrote: Hi, I have been asked the same question. There are only DELETEALIAS and CREATEALIAS actions available, so is there a way to achieve uninterrupted switch of an alias from one index to another? Are we lacking a MOVEALIAS command? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 27. sep. 2013 kl. 10:46 skrev Yago Riveiro yago.rive...@gmail.com: I need delete the alias for the old collection before point it to the new, right? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, September 27, 2013 at 2:25 AM, Otis Gospodnetic wrote: Hi, Imagine you have an index and you need to reindex your data into a new index, but don't want to have to reconfigure or restart client apps when you want to point them to the new index. This is where aliases come in handy. If you created an alias for the first index and made your apps hit that alias, then you can just repoint the same alias to your new index and avoid having to touch client apps. No, I don't think you can write to multiple collections through a single alias. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Thu, Sep 26, 2013 at 6:34 AM, yriveiro yago.rive...@gmail.com(mailto: yago.rive...@gmail.com) wrote: Today I was thinking about the ALIAS feature and the utility on Solr. Can anyone explain me with an example where this feature may be useful? It's possible have an ALIAS of multiples collections, if I do a write to the alias, Is this write replied to all collections? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/ALIAS-feature-can-be-used-for-what-tp4092095.html Sent from the Solr - User mailing list archive at Nabble.com ( http://Nabble.com).
Re: dynamic field question
I'd suggest that each of your source document sections would be a distinct solr document. All of the sections could have a source document ID field to tie them together. Dynamic fields work best when used in moderation. Your use case seems like an excessive use of dynamic fields. -- Jack Krupansky -Original Message- From: Twomey, David Sent: Tuesday, October 08, 2013 6:59 PM To: solr-user@lucene.apache.org Subject: dynamic field question I am having trouble trying to return a particular dynamic field only instead of all dynamic fields. Imagine I have a document with an unknown number of sections. Each section can have a 'title' and a 'body' I have each section title and body as dynamic fields such as section_title_* and section_body_* Imagine that some documents contain a section that has a title=Appendix I want a query that will find all docs with that section and return just the Appendix section. I don't know how to return just that one section though I can copyField my dynamic field section_title_* into a static field called section_titles and query that for docs that contain the Appendix But I don't know how to only return that one dynamic field ?q=section_titles:Appendixfl=section_body_* Any ideas? I can't seem to put a conditional in the fl parameter
Re: SolrCloud High Availability during indexing operation
Pastbin link http://pastebin.com/cnkXhz7A I am doing a bulk request. I am uploading 100 files, each file having 100 docs. -Saurabh On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote: The attachment did not go through - try using pastebin.com or something. Are you adding docs with curl one at a time or in bulk per request. - Mark On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com wrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org