Re: Need Help in migrating Solr version 1.4 to 4.3
On 6/26/2013 11:25 PM, Sandeep Gupta wrote: To have singleton design pattern for SolrServer object creation, I found that there are so many ways described in http://en.wikipedia.org/wiki/Singleton_pattern So which is the best one, out of 5 examples mentioned in above url, for web application in general practice. I am sure lots of people (in this mailing list) will have practical experience as which type of singleton pattern need to be implement for creation of SolrServer object. I will admit that when I used the word singleton I honestly hadn't looked it up to see what it really meant. If you do use the full meaning of singleton, you can do this in any way you want. Perhaps a better thing to say is that you only need one SolrServer object for each base URL (host/port/core combination). Things are a little bit different when it comes to SolrCloud - you can use one CloudSolrServer object for the entire cloud, even if there are many collections and many servers. In my own SolrJ code, I create two HttpSolrServer objects within each of my homegrown Core objects. One of them is for operations against that specific Solr core, the other is for CoreAdmin operations. Because the URL for CoreAdmin operations is common to multiple cores, I create a static Map with those server objects so that my Core objects can share the SolrServer object used for CoreAdmin when they are on the same server machine. For the query side, if you're in a situation where you have one access point to your Solr installation (a load balancer in front of replicating Solr servers) and you only have one index, then you could create a single static SolrServer object for your entire application. Thanks, Shawn
Is there a way to speed up my import
I have a relational database model This is the basics of my data-config.xml entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA] inner join TableB on ... entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2] where ResourceId = '${MyMainEntity.pId}'/entity entity name=Entity1 pk=Id2 query=SELECT [Text] Tag from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity entity name=LibraryItem pk=ResourceId query=select SKU FROM [TableB] INNER JOIN ... ON ... INNER JOIN ... ON ... WHERE ... AND ...' /entity /entity Now, this takes a lot of time. 1 rows in the first query and then each other inner entities are fetched later (around 10 rows each). If I use a db profiler I see a the three inner entities query running over and over (3 select sentences than again 3 select sentences over and over) This is really not efficient. And the import can run over 40 hrs () Now, What are my options to run it faster . 1. Obviously there is an option to flat the tables to one big table - but that will create a lot of other side effects. I would really like to avoid that extra effort and run solr on my production relational tables. So far it works great out of the box and I am searching here if there is a configuration tweak. 2. If I will flat the rows that - does the schema.xml need to be change too? or the same fields that are multivalued will keep being multivalued. Thanks.
Re: Is there a way to speed up my import
On 27 June 2013 12:32, Mysurf Mail stammail...@gmail.com wrote: I have a relational database model This is the basics of my data-config.xml entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA] inner join TableB on ... entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2] where ResourceId = '${MyMainEntity.pId}'/entity entity name=Entity1 pk=Id2 query=SELECT [Text] Tag from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity entity name=LibraryItem pk=ResourceId query=select SKU FROM [TableB] INNER JOIN ... ON ... INNER JOIN ... ON ... WHERE ... AND ...' /entity /entity Now, this takes a lot of time. 1 rows in the first query and then each other inner entities are fetched later (around 10 rows each). If I use a db profiler I see a the three inner entities query running over and over (3 select sentences than again 3 select sentences over and over) This is really not efficient. And the import can run over 40 hrs () Now, What are my options to run it faster . 1. Obviously there is an option to flat the tables to one big table - but that will create a lot of other side effects. I would really like to avoid that extra effort and run solr on my production relational tables. So far it works great out of the box and I am searching here if there is a configuration tweak. 2. If I will flat the rows that - does the schema.xml need to be change too? or the same fields that are multivalued will keep being multivalued. You have not shared your actual queries, so it is difficult to tell, but my guess would be that it is the JOINs that are the bottle-neck rather than the SELECTs. You should start by: 1. Profile queries from the database back-end to see which are taking the most time, and try to simplify them. 2. Make sure that relevant database columns are indexed. This can make a huge difference, though going overboard in indexing all columns might be counter-productive. 3. Use Solr DIH's CachedSqlEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor 4. Measure the time that Solr indexing takes: From your description, you seem to be guessing at it. In general, you should not flatten the records in the database as that is supposed to be relational data. Regards, Gora
Solr admin search with wildcard
I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: Need Help in migrating Solr version 1.4 to 4.3
I have done this - upgraded a 1.4 index to 3.x then on to 4.x. It worked, but... New field types have been introduced over time that facilitate new functionality. To continue to use an upgraded index, you need to continue using the old field types, and thus loose some of the coolness of newer versions. So, a re-index will set you in far better stead, if it is at all possible. Upayavira On Tue, Jun 25, 2013, at 06:37 PM, Erick Erickson wrote: bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes Solr/Lucene explicitly try to read _one_ major revision backwards. Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be able to read Solr 3.x. No attempt is made to allow Solr 4.x to read Solr 1.4 indexes, so I wouldn't even try. Shalin's comment is best. If at all possible I'd just forget about reading the old index and re-index from scratch. But if you _do_ try upgrading 1.4 - 3.x - 4.x, you probably want to optimize at each step. That'll (I think) rewrite all the segments in the current format. Good luck! Erick On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: You must carefully go through the upgrade instructions starting from 1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from 3.1 to 4.0 should be given special attention. On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta gupta...@gmail.com wrote: Hello All, We are planning to migrate solr 1.4 to Solr 4.3 version. And I am seeking some help in this side. Considering Schema file change: By default there are lots of changes if I compare original Solr 1.4 schema file to Sol 4.3 schema file. And that is the reason we are not copying paste of schema file. In our Solr 1.4 schema implementation, we have some custom fields with type textgen and text So in migration of these custom fields to Solr 4.3, should I use type of text_general as replacement of textgen and text_en as replacement of text? Please confirm the same. Please check the text_general definition in 4.3 against the textgen fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en and text. Considering Solrconfig change: As we didn't have lots of changes in 1.4 solrconfig file except the dataimport request handler. And therefore in migration side, we are simply modifying the Solr 4.3 solrconfig file with his request handler. And you need to add the dataimporthandler jar into Solr's lib directory. DIH is not added automatically anymore. Considering the application development: We used all the queries as BOOLEAN type style (was not good) I mean put all the parameter in query fields i.e *:* AND EntityName: AND fileName:fieldValue AND . I think we should simplify our queries using other fields like df, qf Probably. AND queries are best done by filter queries (fq). We also used to create Solr server object via CommonsHttpSolrServer() so I am planning to use now HttpSolrServer API Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in the javabin format so old clients using javabin won't be able to communicate with Solr until you upgrade both solr client and solr servers. Please let me know the suggestion for above points also what are the other factors I need to take care while considering the migration. There is no substitute for reading the upgrade sections in the changes.txt. I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes. You will most likely need to re-index your documents. You should also think about switching to SolrCloud to take advantage of its features. -- Regards, Shalin Shekhar Mangar.
Filter queries taking a long time, even with cache disabled
On a Solr 4.1 install I see that queries with use the fq parameter take a long time (upwards of 120 seconds), both on the standard Lucene query parser and also with edismax. I have added the {!cache=false} localparam to the filter query, but this does not speed up the query. Putting all the search terms in the main query returns results in miliseconds. Note that I am not using any wildcard queries, in each case I am specifying the field to search and the terms to search on. Where should I start to debug? -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Is there a way to build indexes using SOLRJ without SOLR instance?
I'm not a hibernate fan either to be honest, but in the Java world if you have a good model oriented design I'm sure you prefer to map it to a DB using JPA2 for example, in our case, we use EclipseLink which for JPA2 I find it simpler and faster than Hibernate, now, I'm not sure of how many JPA2 implementations can be integrated to Solr/Lucene, several years ago I developed a project nicely using Hibernate + Hibernate Search with just Lucene (no Solr server) In fact I have to apologize for advising Hibernate, but for some people it might be a good start, our company uses a polyglot design where I have Riak + EclipseLink (Objects mapped to PostgreSQL + interceptor to Riak), and for some objects Solr, I wish it was via annotations like in Hibernate search cause is pretty ugly to convert back and forth to json without any automation. All this said, I too care about performance, but sometimes we want less code, design patterns and things to happen automatically, so Hibernate + Hibernate Search (If that's the only capable implementation) might not be a bad idea at all. Guido. On 27/06/13 03:14, Otis Gospodnetic wrote: If hibernate search is like regular hibernate ORM I'm not sure I'd trust it to pick the most optimal solutions... Otis Solr ElasticSearch Support http://sematext.com/ On Jun 26, 2013 4:44 PM, Guido Medina guido.med...@temetra.com wrote: Never heard of embedded Solr server, isn't better to just use lucene alone for that purpose? Using a helper like Hibernate? Since most applications that require indexes will have a relational DB behind the scene, it would not be a bad idea to use a ORM combined with Lucene annotations (aka hibernate-search) Guido. On 26/06/13 20:30, Alexandre Rafalovitch wrote: Yes, it is possible by running an embedded Solr inside SolrJ process. The nice thing is that the index is portable, so you can then access it from the standalone Solr server later. I have an example here: https://github.com/arafalov/**solr-indexing-book/tree/** master/published/solrjhttps://github.com/arafalov/solr-indexing-book/tree/master/published/solrj , which shows SolrJ running both as a client and with an embedded container. Notice that you will probably need more jars than you expect for the standalone Solr to work, including a number of servlet jars. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Jun 26, 2013 at 2:59 PM, Learner bbar...@gmail.com wrote: I currently have a SOLRJ program which I am using for indexing the data in SOLR. I am trying to figure out a way to build index without depending on running instance of SOLR. I should be able to supply the solrconfig and schema.xml to the indexing program which in turn create index files that I can use with any SOLR instance. Is it possible to implement this? -- View this message in context: http://lucene.472066.n3.** nabble.com/Is-there-a-way-to-**build-indexes-using-SOLRJ-** without-SOLR-instance-**tp4073383.htmlhttp://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383.html Sent from the Solr - User mailing list archive at Nabble.com.
Data Import Handler and Extract Handler
Hi all, I am new to SOLR. I have been working through the SOLR 4 Cookbook and my experiences so far have been great. I have worked through the extraction of PDF data recipe, and the Data import recipe. I would now like to join these two things, i.e. I would like to do a data import from a Database table of users, and then somehow associate indexed PDF data with rows that were imported. I have a conceptual link between rows in the database and pdf documents, but I don't know how to make a physical link between the two in SOLR. For example, I know that user x has pdf documents a, b and c. If I have imported my users into SOLR using Data Import Handler, how would I 1) import and associate the pdf documents using the extract mechanism, in such a way that there is a link between user x and the 3 pdf documents as described above? 2) is there a better way to join a table of users to a set of pdf documents? Thanks in advance Scott. This e-mail is subject to a disclaimer, available at http://www.rmb.co.za/web/elements.nsf/online/disclaimer-communications.html
Re: Is there a way to speed up my import
I just configured with the caching and it works mighty fast now. Instead of unbelievable amount queries it queris only 4 times. CPU usage has moved from the db to the solr computer but only for a very short time. Problem : I dont see the multi value fields (Inner Entities) anymore This is my configuration entity name=PackageVersion pk=PackageVersionId query=select PackageVersion.Id PackageVersionId, from entity name=PackageTag pk=ResourceId processor=CachedSqlEntityProcessor where=ResourceId = '${PackageVersion.PackageId}' query=SELECT [Text] PackageTag from [dbo].[Tag] /entity entity name=PackageVersionTag pk=ResourceId processor=CachedSqlEntityProcessor where=ResourceId = PackageVersion.PackageVersionId query=SELECT [Text] PackageVersionTag from [dbo].[Tag] /entity entity name=LibraryItem pk=ResourceId processor=CachedSqlEntityProcessor where=Asset.[PackageVersionId] = PackageVersion.PackageVersionId query=select CatalogVendorPartNum SKU, LibraryItems.[Description] SKUDescription FROM ... INNER JOIN ... ON Asset.Id = LibraryVendors.DesignProjectId INNER JOIN ... ON LibraryVendors.LibraryVendorId = LibraryItems.LibraryVendorId WHERE Asset.[AssetTypeId]=1 /entity /entity Now, when I query http://localhost:8983/solr/vaultCache/select?q=*indent=true it returns only the main entity attriburtes. Where are my inner entities attributes now? Thanks a lot. On Thu, Jun 27, 2013 at 10:15 AM, Gora Mohanty g...@mimirtech.com wrote: On 27 June 2013 12:32, Mysurf Mail stammail...@gmail.com wrote: I have a relational database model This is the basics of my data-config.xml entity name=MyMainEntity pk=pID query=select ... from [dbo].[TableA] inner join TableB on ... entity name=Entity1 pk=Id1 query=SELECT [Text] Tag from [Table2] where ResourceId = '${MyMainEntity.pId}'/entity entity name=Entity1 pk=Id2 query=SELECT [Text] Tag from [Table2] where ResourceId2 = '${MyMainEntity.pId}'/entity entity name=LibraryItem pk=ResourceId query=select SKU FROM [TableB] INNER JOIN ... ON ... INNER JOIN ... ON ... WHERE ... AND ...' /entity /entity Now, this takes a lot of time. 1 rows in the first query and then each other inner entities are fetched later (around 10 rows each). If I use a db profiler I see a the three inner entities query running over and over (3 select sentences than again 3 select sentences over and over) This is really not efficient. And the import can run over 40 hrs () Now, What are my options to run it faster . 1. Obviously there is an option to flat the tables to one big table - but that will create a lot of other side effects. I would really like to avoid that extra effort and run solr on my production relational tables. So far it works great out of the box and I am searching here if there is a configuration tweak. 2. If I will flat the rows that - does the schema.xml need to be change too? or the same fields that are multivalued will keep being multivalued. You have not shared your actual queries, so it is difficult to tell, but my guess would be that it is the JOINs that are the bottle-neck rather than the SELECTs. You should start by: 1. Profile queries from the database back-end to see which are taking the most time, and try to simplify them. 2. Make sure that relevant database columns are indexed. This can make a huge difference, though going overboard in indexing all columns might be counter-productive. 3. Use Solr DIH's CachedSqlEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor 4. Measure the time that Solr indexing takes: From your description, you seem to be guessing at it. In general, you should not flatten the records in the database as that is supposed to be relational data. Regards, Gora
Re: Is there a way to build indexes using SOLRJ without SOLR instance?
If what you want to do is create an index that can later be used by Solr, then create the index with Solr. Solr has constraints about how a Lucene index is created that you would replicate and would create a huge amount of work. SolrJ does have the 'embedded mode' in which the Solr itself runs in the same JVM as the client - i.e. no HTTP transport. It could be a useful way to do off-line index creation, I've never used it though so can't vouch for it. Upayavira On Wed, Jun 26, 2013, at 09:43 PM, Guido Medina wrote: Never heard of embedded Solr server, isn't better to just use lucene alone for that purpose? Using a helper like Hibernate? Since most applications that require indexes will have a relational DB behind the scene, it would not be a bad idea to use a ORM combined with Lucene annotations (aka hibernate-search) Guido. On 26/06/13 20:30, Alexandre Rafalovitch wrote: Yes, it is possible by running an embedded Solr inside SolrJ process. The nice thing is that the index is portable, so you can then access it from the standalone Solr server later. I have an example here: https://github.com/arafalov/solr-indexing-book/tree/master/published/solrj , which shows SolrJ running both as a client and with an embedded container. Notice that you will probably need more jars than you expect for the standalone Solr to work, including a number of servlet jars. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Jun 26, 2013 at 2:59 PM, Learner bbar...@gmail.com wrote: I currently have a SOLRJ program which I am using for indexing the data in SOLR. I am trying to figure out a way to build index without depending on running instance of SOLR. I should be able to supply the solrconfig and schema.xml to the indexing program which in turn create index files that I can use with any SOLR instance. Is it possible to implement this? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shard failure, leader transition took 11s (seems high?)
On thinking about this, isn't it a potentially more serious problem, especially in view of the NRT support which Solr now offers? If a server crashes (hard), ZK detects this using the heartbeat, and would remove the /live_node, which would trigger a leader election for this shard. But if we soft shut it down, it seems that again we have to wait for the instance to physically die (and the live_node to disappear) before we get a leadership election. For all the time between Jetty shutting down and that happening, we have no valid leader for that shard (but ZK and the rest of the cloud think we do). Now searches to that shard are distributed round-robin (using the standard Solr load balancer within Solr cloud) so they will see the failed node, and immediately retry to another replica (and presumably work). However, updates keep going to the (now dead) leader, shouldn't that error in SolrCmdDistributor (forwarding update to http://xx4:10600/solr/collection1/ failed - retrying) trigger an election? Retrying to a node which isn't available works if it was a transient issue, but is that the more common case? Maybe we have a more specialized case than most, but we have very frequent updates and want (near) real-time indexing, we are trying to minimize latency between index and search. We currently soft-commit every 1s to do that and we might get several hundred stories during that second, so failing all updates for 11s in our case is a serious issue. I know the Cloud has returned an error code so we know the updates have failed, but at our application level, there is nothing else we can do, surely? Solr has to send to the leader, but the leader isn't available, so shouldn't the cloud be handling that? On 24 June 2013 14:58, Daniel Collins danwcoll...@gmail.com wrote: Thanks Mark. Yes, I expected some finite time for the leader to take over, just hadn't realized/comprehended that Jetty was already shutdown by this point... Yes, I suppose the container has to stop sending requests to the context before it can shut the context down, so that's the window where the individual container knows its going down, but nothing else does (yet). Will try to have a think about that shutdown/stop API, I suspect we'll need it for production (yes we can retry but we are using soft-commit to get a NRT as we can, so a 10s pause isn't really acceptable in our case). On 24 June 2013 14:46, Mark Miller markrmil...@gmail.com wrote: It will take a short bit of a time before a new leader takes over when a leader goes - that's expected - how long it takes will vary. Some things will do short little retries to kind of deal with this, but you are alerted those updates failed, so you have to deal with that as you would other update fails on the client side. SolrCloud favors consistency over write availability. That's the short part where you lose write availability. To get a 'clean' shutdown - eg you want to bring the machine down, it didn't get hit by lightening, we have to add some specific clean stop api you can call first - by the time jetty (or whatever container) tells Solr it's shutting down, it's too late to pull the node out gracefully. I've danced around it in the past, but have never gotten to making that clean shutdown/stop API. - Mark On Jun 24, 2013, at 8:25 AM, Daniel Collins danwcoll...@gmail.com wrote: Just had an odd scenario in our current Solr system (4.3.0 + SOLR-4829 patch), 4 shards, 2 replicas (leader + 1 other) per shard spread across 8 machines. We sent all our updates into a single instance, and we shutdown a leader for maintenance, expecting it to failover to the other replica. What I saw was that when the leader shard went down, the instance taking updates started seeing rejections almost instantly, yet the cluster state changes didn't occur for several seconds. During that time, we had no valid leader for one of our shards, so we were losing updates and queries. (shard4 leader) 07:10:33,124 - xx4 (shard 4 leader) starts coming down. 07:10:35,885 - cluster state change is detected 07:10:37,172 - nsrchnj4 publishes itself as down 07:10:37,869 - second cluster state change detected 07:10:40,202 - closing searcher 07:10:43,447 - cluster state change (live_nodes) (instance taking updates) 07:10:33,443 - starts seeing rejections from xx4 07:10:35,937 - detects a cluster state change (red herring) 07:10:37,899 - detects another cluster state change 07:10:43,478 - detects a live_nodes change (as shard4 leader is really down now) 07:10:44,586 - detects that shard4 has no leader anymore (x8) - new shard4 leader 07:10:32,981 - last story FROMLEADER (xx4) 07:10:35,980 - cluster state change detected (red herring) 07:10:37,975 - another cluster state change detected 07:10:43,868 - running election process(!) 07:10:44,069 - nsrchnj8 becomes leader, tries to sync from nsrchnj4 (which is
Re: Filter queries taking a long time, even with cache disabled
can you give an example? On Thu, Jun 27, 2013, at 09:08 AM, Dotan Cohen wrote: On a Solr 4.1 install I see that queries with use the fq parameter take a long time (upwards of 120 seconds), both on the standard Lucene query parser and also with edismax. I have added the {!cache=false} localparam to the filter query, but this does not speed up the query. Putting all the search terms in the main query returns results in miliseconds. Note that I am not using any wildcard queries, in each case I am specifying the field to search and the terms to search on. Where should I start to debug? -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Solr, Shards, multi cores and (reverse proxy)
* I have created a new RequestHandler and added the list of the shards : ... str name=shardslocalhost:8780/apache-solr/leg0,localhost:8780/apache-solr/leg1,localhost:8780/apache-solr/leg2,localhost:8780/apache-solr/leg3,localhost:8780/apache-solr/leg4,localhost:8780/apache-solr/leg5/str ... * In the url, I replaced shards=... by shards.qt It is working well. Thanks a lot of your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Shards-multi-cores-and-reverse-proxy-tp4072094p4073543.html Sent from the Solr - User mailing list archive at Nabble.com.
Is Overlapping onDeckSearchers=2 really a problem?
Hi, I have a desktop application where I am abusing solr as an embedded database accessing it and I am quite happy with everything. Performance is more than goog enough for my use case and Solr's query capabilities match the requirements of my app quite well. However, I have the well-known performance warnings (see subject) in the log whenever I index a lot of documents, although I never experience any performance problems (might be hidden, though). The properties of my app are: - I (soft-)commit after every indexed item because I need the changes to be visible immediately - The commits are serialized - I do not have any warming queries configured I have read the FAQ but don't see anthing that helps in my case. As I said, I am happy with everything as it is but the warning makes me a bit nervous (and maybe at some point my customers when their logs are full of those warnings). What could I do to eliminate it? Can I configure only one searcher to be used or anything like that? Thanks for any hints, Robert
Re: Need Help in migrating Solr version 1.4 to 4.3
As much as possible, use new configs. Take fieldType definitions from your 4.x example dir, don't use the old ones. e.g. if you use the old date field type, it won't be usable in various ways (e.g. in the MS() function). Upayavira On Thu, Jun 27, 2013, at 11:00 AM, Sandeep Gupta wrote: Thanks again Shawn for your comments. I am little worried about the multi threading of web application which uses servlets. I also found one of your explanation (please confirm the same whether its your comment only) in http://lucene.472066.n3.nabble.com/Memory-problems-with-HttpSolrServer-td4060985.html for the question : http://stackoverflow.com/questions/11931179/httpsolrserver-instance-management As you said correctly that creation of SolrServer object depends on number of shards/solrcores and thereafter need to think for implementation which may use singleton pattern. In my web application side, I have only one solrcore which is default one collection1 so I will create one SolrServer object for my application. Sure If we decide to go for Solr Cloud then also I will create one object. Thanks Upayavira, yes I will do the re-index. Anything you want to suggest as you did the same migration. Thanks Sandeep On Thu, Jun 27, 2013 at 1:33 PM, Upayavira u...@odoko.co.uk wrote: I have done this - upgraded a 1.4 index to 3.x then on to 4.x. It worked, but... New field types have been introduced over time that facilitate new functionality. To continue to use an upgraded index, you need to continue using the old field types, and thus loose some of the coolness of newer versions. So, a re-index will set you in far better stead, if it is at all possible. Upayavira On Tue, Jun 25, 2013, at 06:37 PM, Erick Erickson wrote: bq: I'm not sure if Solr 4.3 will be able to read Solr 1.4 indexes Solr/Lucene explicitly try to read _one_ major revision backwards. Solr 3.x should be able to read 1.4 indexes. Solr 4.x should be able to read Solr 3.x. No attempt is made to allow Solr 4.x to read Solr 1.4 indexes, so I wouldn't even try. Shalin's comment is best. If at all possible I'd just forget about reading the old index and re-index from scratch. But if you _do_ try upgrading 1.4 - 3.x - 4.x, you probably want to optimize at each step. That'll (I think) rewrite all the segments in the current format. Good luck! Erick On Tue, Jun 25, 2013 at 12:59 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: You must carefully go through the upgrade instructions starting from 1.4 upto 4.3. In particular the instructions for 1.4 to 3.1 and from 3.1 to 4.0 should be given special attention. On Tue, Jun 25, 2013 at 11:43 AM, Sandeep Gupta gupta...@gmail.com wrote: Hello All, We are planning to migrate solr 1.4 to Solr 4.3 version. And I am seeking some help in this side. Considering Schema file change: By default there are lots of changes if I compare original Solr 1.4 schema file to Sol 4.3 schema file. And that is the reason we are not copying paste of schema file. In our Solr 1.4 schema implementation, we have some custom fields with type textgen and text So in migration of these custom fields to Solr 4.3, should I use type of text_general as replacement of textgen and text_en as replacement of text? Please confirm the same. Please check the text_general definition in 4.3 against the textgen fieldtype in Solr 1.4 to see if they're equivalent. Same for text_en and text. Considering Solrconfig change: As we didn't have lots of changes in 1.4 solrconfig file except the dataimport request handler. And therefore in migration side, we are simply modifying the Solr 4.3 solrconfig file with his request handler. And you need to add the dataimporthandler jar into Solr's lib directory. DIH is not added automatically anymore. Considering the application development: We used all the queries as BOOLEAN type style (was not good) I mean put all the parameter in query fields i.e *:* AND EntityName: AND fileName:fieldValue AND . I think we should simplify our queries using other fields like df, qf Probably. AND queries are best done by filter queries (fq). We also used to create Solr server object via CommonsHttpSolrServer() so I am planning to use now HttpSolrServer API Yes. Also, there was a compatibility break between Solr 1.4 and 3.1 in the javabin format so old clients using javabin won't be able to communicate with Solr until you upgrade both solr client and solr servers. Please let me know the suggestion for above points also what are the other factors I need to take care while considering the migration. There is no substitute for reading the upgrade sections
Searching and Retrieving Information Protocol For Solr
There is a low level protocol that defines client–server protocol for searching and retrieving information from remote computer databases called as Z39.50. Due to Solr is a commonly used search engine (beside being a NoSQL database) is there any protocol for (I don't mean a low level protocol, z39.50 is just an example) Solr that it can integrate with other clients or anything else?
Re: Is there a way to speed up my import
On 27 June 2013 14:12, Mysurf Mail stammail...@gmail.com wrote: I just configured with the caching and it works mighty fast now. Instead of unbelievable amount queries it queris only 4 times. CPU usage has moved from the db to the solr computer but only for a very short time. Problem : I dont see the multi value fields (Inner Entities) anymore This is my configuration [...] Please check the syntax of your where clause against http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor Your inner entities should have clauses like where=ResourceId=PackageVersion.PackageId. I am also not sure why you have the strange square brackets. Regards, Gora
Re: Is Overlapping onDeckSearchers=2 really a problem?
Hi, On Thu, Jun 27, 2013 at 12:23 PM, Robert Krüger krue...@lesspain.de wrote: Hi, I have a desktop application where I am abusing solr as an embedded database accessing it and I am quite happy with everything. Performance is more than goog enough for my use case and Solr's query capabilities match the requirements of my app quite well. However, I have the well-known performance warnings (see subject) in the log whenever I index a lot of documents, although I never experience any performance problems (might be hidden, though). The properties of my app are: - I (soft-)commit after every indexed item because I need the changes to be visible immediately - The commits are serialized - I do not have any warming queries configured I have read the FAQ but don't see anthing that helps in my case. As I said, I am happy with everything as it is but the warning makes me a bit nervous (and maybe at some point my customers when their logs are full of those warnings). What could I do to eliminate it? Can I configure only one searcher to be used or anything like that? Thanks for any hints, Robert sometime forcing oneself to describe a problem is the first step to a solution. I just realized that I also had an autocommit statement in my config with the exact same amount of time the seemed to be between the warnings. I removed that, because I don't think I really need it, and now the warnings are gone. So it seems it happened whenever my manual commits overlapped with an autocommit, which, of course, was more likely when many commits were issued in sequence.
displaying one result per domain
I'm looking for a neat solution to replace default multiple results from single domain in SERP somepage.com/contact.html somepage.com/aboutus.html otherpage.net/info.html somepage.com/directions.html etc with only one result per each domain [main URL by default] somepage.com otherpage.net completelydifferentpage.org Tried grouping by Carrot2 but it's not exactly what I'm looking for. Thanks in advance.
Re: Solr admin search with wildcard
No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
how to delete on column of a doc in solr
In my solr schema there is one dynamic field. dynamicField name=jobs_* type=floatindexed=true stored=true/ So I have one doc value, docs: [ { last_name: Jain, state_name: rajasthan, mobile_no: 234534564621, id: 4, jobs_6554: 6554, },...] Now I just want to delete one column, means jobs_6554 not the complete doc. How it can possible in solr. So after delete, docs will be. docs: [ { last_name: Jain, state_name: rajasthan, mobile_no: 234534564621, id: 4 },...] -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-delete-on-column-of-a-doc-in-solr-tp4073587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: displaying one result per domain
Extract the domain (the main URL you mention) into its own indexed field and use field collapsing/grouping: http://wiki.apache.org/solr/FieldCollapsing Erik On Jun 27, 2013, at 08:18 , Wojciech Kapelinski wrote: I'm looking for a neat solution to replace default multiple results from single domain in SERP somepage.com/contact.html somepage.com/aboutus.html otherpage.net/info.html somepage.com/directions.html etc with only one result per each domain [main URL by default] somepage.com otherpage.net completelydifferentpage.org Tried grouping by Carrot2 but it's not exactly what I'm looking for. Thanks in advance.
Re: Solr admin search with wildcard
The stored and indexed string is actually a url like http://www.youtube.com/somethingsomething;. It looks like removing the quotes does the job: iframe:*youtube* or am I wrong ? For now, performance is not an issue, but accuracy is and I would like to know for example how many URLS have iframe source leading to YouTube for example. So query like: iframe:*youtube* with max rows 10 or something will return in the response numFound field the total number of pages that have a tag ifarme with a source matching *youtube, No ? On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.comwrote: No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: Solr admin search with wildcard
Just copyField from the string field to a text field and use standard tokenization, then you can search the text field for youtube or even something that is a component of the URL path. No wildcard required. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Solr admin search with wildcard The stored and indexed string is actually a url like http://www.youtube.com/somethingsomething;. It looks like removing the quotes does the job: iframe:*youtube* or am I wrong ? For now, performance is not an issue, but accuracy is and I would like to know for example how many URLS have iframe source leading to YouTube for example. So query like: iframe:*youtube* with max rows 10 or something will return in the response numFound field the total number of pages that have a tag ifarme with a source matching *youtube, No ? On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.comwrote: No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: how to delete on column of a doc in solr
Atomic update. For example: curl http://localhost:8983/solr/update?commit=true \ -H Content-Type: application/json -d ' [{id: text-1, text_ss: {set: null}}]' (From the book!) That's for one document. If you want to do that for all documents, you will have to iterate yourself. But... it sounds like you have arbitrary, unknown field names (dynamic). If you want to delete them, you will need to know the field name. You will have to write a loop that reads every document, figures out the dynamic field name, and then you can update with atomic update. You may want to rethink your data model. -- Jack Krupansky -Original Message- From: anurag.jain Sent: Thursday, June 27, 2013 8:28 AM To: solr-user@lucene.apache.org Subject: how to delete on column of a doc in solr In my solr schema there is one dynamic field. dynamicField name=jobs_* type=floatindexed=true stored=true/ So I have one doc value, docs: [ { last_name: Jain, state_name: rajasthan, mobile_no: 234534564621, id: 4, jobs_6554: 6554, },...] Now I just want to delete one column, means jobs_6554 not the complete doc. How it can possible in solr. So after delete, docs will be. docs: [ { last_name: Jain, state_name: rajasthan, mobile_no: 234534564621, id: 4 },...] -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-delete-on-column-of-a-doc-in-solr-tp4073587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: displaying one result per domain
The URL Classify Update Processor can take a URL and split it into pieces, including the host name. http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessorFactory.html Unfortunately, the Javadoc is sparse, not even one example. I have some examples in the book. You can also use a regular expression tokenfilter to extract the host name as well. And you can use standard Solr grouping to group by the field containing host name. -- Jack Krupansky -Original Message- From: Wojciech Kapelinski Sent: Thursday, June 27, 2013 8:18 AM To: solr-user@lucene.apache.org Subject: displaying one result per domain I'm looking for a neat solution to replace default multiple results from single domain in SERP somepage.com/contact.html somepage.com/aboutus.html otherpage.net/info.html somepage.com/directions.html etc with only one result per each domain [main URL by default] somepage.com otherpage.net completelydifferentpage.org Tried grouping by Carrot2 but it's not exactly what I'm looking for. Thanks in advance.
Re: Classic 4.2 master-slave replication not completing
Okay, I have done this (updated to 4.3.1 across master and four slaves; one of these is my own PC for experiments, it is not being accessed by clients). Just had a minor replication this morning, and all three slaves are stuck again. Replication supposedly started at 8:40, ended 30 seconds later or so (on my local PC, set up identically to the other three slaves). The three slaves will NOT complete the roll-over to the new index. All three index folders have a write.lock and latest files are dated 8:40am (now it is 8:54am, with no further activity in the index folders). There exists an index.2013062708461 (or some variation thereof) in all three slaves' data folder. The seemingly-relevant thread dump of a snappuller thread on each of these slaves: - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) - java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) - java.util.concurrent.FutureTask.get(FutureTask.java:83) - org.apache.solr.handler.SnapPuller.openNewWriterAndSearcher(SnapPuller.java:631) - org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:446) - org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317) - org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223) - java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) - java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) - java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) - java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) - java.lang.Thread.run(Thread.java:662) Here they sit. My local PC slave replicated very quickly, switched over to the new generation (206) immediately. I am not sure why the three slaves are dragging on this. If there's any configuration elements or other details you need, please let me know. I can manually kick them by reloading the core from the admin pages, but obviously I would like this to be a hands-off process. Any help is greatly appreciated; this has been bugging me for some time now. On Mon, Jun 24, 2013 at 9:34 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: A bunch of replication related issues were fixed in 4.2.1 so you're better off upgrading to 4.2.1 or later (4.3.1 is the latest release). On Mon, Jun 24, 2013 at 6:55 PM, Neal Ensor nen...@gmail.com wrote: As a bit of background, we run a setup (coming from 3.6.1 to 4.2 relatively recently) with a single master receiving updates with three slaves pulling changes in. Our index is around 5 million documents, around 26GB in size total. The situation I'm seeing is this: occasionally we update the master, and replication begins on the three slaves, seems to proceed normally until it hits the end. At that point, it sticks; there's no messages going on in the logs, nothing on the admin page seems to be happening. I sit there for sometimes upwards of 30 minutes, seeing no further activity in the index folder(s). After a while, I go to the core admin page and manually reload the core, which catches it up. It seems like the index readers / writers are not releasing the index otherwise? The configuration is set to reopen; very occasionally this situation actually fixes itself after a longish period of time, but it seems very annoying. I had at first suspected this to be due to our underlying shared (SAN) storage, so we installed SSDs in all three slave machines, and moved the entire indexes to those. It did not seem to affect this issue at all (additionally, I didn't really see the expected performance boost, but that's a separate issue entirely). Any ideas? Any configuration details I might share/reconfigure? Any suggestions are appreciated. I could also upgrade to the later 4.3+ versions, if that might help. Thanks! Neal Ensor nen...@gmail.com -- Regards, Shalin Shekhar Mangar.
Dot operater issue.
Hi team, When the user enter search term as h.e.r.b.a.l in the search textbox and click on search button then SOLR search engine is not returning any results found. As I can see SOLR is accepting the request parameter as h.e.r.b.a.l. However we have many records with the string h.e.r.b.a.l as part of the product name. Look like there is an issue with dot operator in the search term. If we enter search term as herbal then it is returning search results . Our requirement is search term should be h.e.r.b.a.l then it needs to display results based on dot operator . Please help us on this issue. Regards Srinivas ::DISCLAIMER:: The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
Re: Dot operater issue.
Hi Sri, This depends on how the fields (that hold the value) are defined and how the query is generated. Try running the query in solr console and use debug=true to see how the query string is getting parsed. If that doesn't help then could you answer following 3 questions relating to your question. 1) field definition in schema.xml 2) solr query url 3) parser config from solrconfig.xml Thanks, Sandeep On 27 June 2013 10:41, Srinivasa Chegu cheg...@hcl.com wrote: Hi team, When the user enter search term as h.e.r.b.a.l in the search textbox and click on search button then SOLR search engine is not returning any results found. As I can see SOLR is accepting the request parameter as h.e.r.b.a.l. However we have many records with the string h.e.r.b.a.l as part of the product name. Look like there is an issue with dot operator in the search term. If we enter search term as herbal then it is returning search results . Our requirement is search term should be h.e.r.b.a.l then it needs to display results based on dot operator . Please help us on this issue. Regards Srinivas ::DISCLAIMER:: The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
TermVector and Sharding issue
Hello everyone, I saw that the ticket regarding this issue is still open ( https://issues.apache.org/jira/browse/SOLR-4479). There is last comment that suggests to reindex documents with solr 4.2. I did reindex with 4.3 version but term vector still doesn't work producing null pointer exception. So, does anyone had the same problem? Is there a workaround?
Re: Solr admin search with wildcard
Forgive my ignorance but I want to be sure, do I add copyField source=iframe dest=text/ to solrindex-mapping.xml? so that my solrindex-mapping.xml looks like this: fields field dest=content source=content/ field dest=title source=title/ field dest=iframe source=iframe/ field dest=host source=host/ field dest=segment source=segment/ field dest=boost source=boost/ field dest=digest source=digest/ field dest=tstamp source=tstamp/ field dest=id source=url/ copyField source=url dest=url/ *copyField source=iframe dest=text/ * /fields uniqueKeyurl/uniqueKey And what do you mean by standard tokenization ? Thanks! On Thu, Jun 27, 2013 at 3:43 PM, Jack Krupansky j...@basetechnology.comwrote: Just copyField from the string field to a text field and use standard tokenization, then you can search the text field for youtube or even something that is a component of the URL path. No wildcard required. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Solr admin search with wildcard The stored and indexed string is actually a url like http://www.youtube.com/**somethingsomethinghttp://www.youtube.com/somethingsomething . It looks like removing the quotes does the job: iframe:*youtube* or am I wrong ? For now, performance is not an issue, but accuracy is and I would like to know for example how many URLS have iframe source leading to YouTube for example. So query like: iframe:*youtube* with max rows 10 or something will return in the response numFound field the total number of pages that have a tag ifarme with a source matching *youtube, No ? On Thu, Jun 27, 2013 at 3:24 PM, Jack Krupansky j...@basetechnology.com* *wrote: No, you cannot use wildcards within a quoted term. Tell us a little more about what your strings look like. You might want to consider tokenizing or using ngrams to avoid the need for wildcards. -- Jack Krupansky -Original Message- From: Amit Sela Sent: Thursday, June 27, 2013 3:33 AM To: solr-user@lucene.apache.org Subject: Solr admin search with wildcard I'm looking to search (in the solr admin search screen) a certain field for: *youtube* I know that leading wildcards takes a lot of resources but I'm not worried with that My only question is about the syntax, would this work: field:*youtube* ? Thanks, I'm using Solr 3.6.2
Re: Data Import Handler and Extract Handler
On 27 June 2013 13:42, Venter, Scott scott.ven...@rmb.co.za wrote: Hi all, I am new to SOLR. I have been working through the SOLR 4 Cookbook and my experiences so far have been great. I have worked through the extraction of PDF data recipe, and the Data import recipe. I would now like to join these two things, i.e. I would like to do a data import from a Database table of users, and then somehow associate indexed PDF data with rows that were imported. I have a conceptual link between rows in the database and pdf documents, but I don't know how to make a physical link between the two in SOLR. For example, I know that user x has pdf documents a, b and c. If I have imported my users into SOLR using Data Import Handler, how would I 1) import and associate the pdf documents using the extract mechanism, in such a way that there is a link between user x and the 3 pdf documents as described above? [...] Where are your PDF documents? Presumably on the filesystem or available from a web service. What you can do is to have two datasources in your DIH configuration file: * The first one is a JdbcDataSource that extracts data from a database. Presumably, you already have this working. * The second is a BinFileDataSource assuming that your PDF files are on the filesystem. * In the top-level entity, select the user and the names of the associated PDF files. * Use a nested inner entity with the dataSource attribute set to the BinFileDataSource, and use the TikaEntityProcessor to index the PDF files. The documentation on this is a little scattered, but see: http://wiki.apache.org/solr/TikaEntityProcessor http://lucene.472066.n3.nabble.com/problem-to-indexing-pdf-directory-td3749554.html Regards, Gora
Re: Filter queries taking a long time, even with cache disabled
On Thu, Jun 27, 2013 at 12:14 PM, Upayavira u...@odoko.co.uk wrote: can you give an example? Thank you. This is an example query: select ?q=search_field:iraq fq={!cache=false}search_field:love%20obama defType=edismax -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Classic 4.2 master-slave replication not completing
Odd - looks like it's stuck waiting to be notified that a new searcher is ready. - Mark On Jun 27, 2013, at 8:58 AM, Neal Ensor nen...@gmail.com wrote: Okay, I have done this (updated to 4.3.1 across master and four slaves; one of these is my own PC for experiments, it is not being accessed by clients). Just had a minor replication this morning, and all three slaves are stuck again. Replication supposedly started at 8:40, ended 30 seconds later or so (on my local PC, set up identically to the other three slaves). The three slaves will NOT complete the roll-over to the new index. All three index folders have a write.lock and latest files are dated 8:40am (now it is 8:54am, with no further activity in the index folders). There exists an index.2013062708461 (or some variation thereof) in all three slaves' data folder. The seemingly-relevant thread dump of a snappuller thread on each of these slaves: - sun.misc.Unsafe.park(Native Method) - java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) - java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) - java.util.concurrent.FutureTask.get(FutureTask.java:83) - org.apache.solr.handler.SnapPuller.openNewWriterAndSearcher(SnapPuller.java:631) - org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:446) - org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317) - org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:223) - java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) - java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) - java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) - java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) - java.lang.Thread.run(Thread.java:662) Here they sit. My local PC slave replicated very quickly, switched over to the new generation (206) immediately. I am not sure why the three slaves are dragging on this. If there's any configuration elements or other details you need, please let me know. I can manually kick them by reloading the core from the admin pages, but obviously I would like this to be a hands-off process. Any help is greatly appreciated; this has been bugging me for some time now. On Mon, Jun 24, 2013 at 9:34 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: A bunch of replication related issues were fixed in 4.2.1 so you're better off upgrading to 4.2.1 or later (4.3.1 is the latest release). On Mon, Jun 24, 2013 at 6:55 PM, Neal Ensor nen...@gmail.com wrote: As a bit of background, we run a setup (coming from 3.6.1 to 4.2 relatively recently) with a single master receiving updates with three slaves pulling changes in. Our index is around 5 million documents, around 26GB in size total. The situation I'm seeing is this: occasionally we update the master, and replication begins on the three slaves, seems to proceed normally until it hits the end. At that point, it sticks; there's no messages going on in the logs, nothing on the admin page seems to be happening. I sit there for sometimes upwards of 30 minutes, seeing no further activity in the index folder(s). After a while, I go to the core admin page and manually reload the core, which catches it up. It seems like the index readers / writers are not releasing the index otherwise? The configuration is set to reopen; very occasionally this situation actually fixes itself after a longish period of time, but it seems very annoying. I had at first suspected this to be due to our underlying shared (SAN) storage, so we installed SSDs in all three slave machines, and moved the entire indexes to those. It did not seem to affect this issue at all (additionally, I didn't really see the expected performance boost, but that's a separate issue entirely). Any ideas? Any configuration details I might share/reconfigure? Any suggestions are appreciated. I could also upgrade to the later 4.3+ versions, if that might help. Thanks!
ConcurrentUpdateSolrServer hanging
Hi, I'm using concurrentUpdateSolrServer to do my incremental indexing nightly. I have 50 shards to index into, about 10,000 documents each night. I start one concurrentUpdateSolrServer on each shards and start to send documents. The queue size for concurrentUpdateSolrServer is 100, and 4 threads. At the end of the import, i will send commit using the same concurrentUpdateSolrServer. The problem is some of the concurrentUpdateSolrServer is not sending the commit to the shards and the import task hangs for a couple hours. So I looked at the log and find out that the shards received about 1000 document couple hours later following with a commit. Is there anything methods I can call to flush out documents before I send the commit? Or are there any existing issue related to concurrentUpdateSolrServer related to this? Thanks, Qun -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-hanging-tp4073620.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is Overlapping onDeckSearchers=2 really a problem?
On 6/27/2013 5:59 AM, Robert Krüger wrote: sometime forcing oneself to describe a problem is the first step to a solution. I just realized that I also had an autocommit statement in my config with the exact same amount of time the seemed to be between the warnings. I removed that, because I don't think I really need it, and now the warnings are gone. So it seems it happened whenever my manual commits overlapped with an autocommit, which, of course, was more likely when many commits were issued in sequence. If all you are doing is soft commits, your transaction logs are going to grow out of control. http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup My recommendation: 1) Remove all commits from your indexing application. 2) Configure autoCommit with values similar to that wiki page. 3) Configure autoSoftCommit to happen often. The autoCommit must have openSearcher set to false. For autoSoftCommit, include a maxTime between 1000 and 5000 (milliseconds) and leave maxDocs out. Thanks, Shawn
solrj indexing using embedded solr is slow
I was using ConcurrentUpdateSOLR for indexing documents to Solr. Later I had a need to do portable indexing hence started using Embedded solr server. I created a multithreaded program to create /submit the documents in batch of 100 to Embedded SOLR server (running inside Solrj indexing process) but for some reason it takes more time to index the data when compared with ConcurrentUpdateSOLR server(CUSS). I was under assumption that embedded server would take less time compared to http update (made when using CUSS) but not sure why it takes more time... Is there a way to speed up the indexing when using Embedded solr serveretc..(something like specifying thread and queue size similar to CUSS)? -- View this message in context: http://lucene.472066.n3.nabble.com/solrj-indexing-using-embedded-solr-is-slow-tp4073636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR online reference document - WIKI
This page never came up on any of my Google searches, so thanks for the heads up! Looks good. -Luis On Tue, Jun 25, 2013 at 12:32 PM, Learner bbar...@gmail.com wrote: I just came across a wonderful online reference wiki for SOLR and thought of sharing it with the community.. https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-online-reference-document-WIKI-tp4073110.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR online reference document - WIKI
It is all new, and as yet unreleased. It still has more work needed on formatting, etc, so I guess you could say, make of it what you will, and don't yet assume it will always be up and available. Upayavira On Thu, Jun 27, 2013, at 04:25 PM, Luis Lebolo wrote: This page never came up on any of my Google searches, so thanks for the heads up! Looks good. -Luis On Tue, Jun 25, 2013 at 12:32 PM, Learner bbar...@gmail.com wrote: I just came across a wonderful online reference wiki for SOLR and thought of sharing it with the community.. https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-online-reference-document-WIKI-tp4073110.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ConcurrentUpdateSolrServer hanging
Qun, Are you using blockUntilFinished() and/or shutdown()? One of the things to note is that a commit is just another document, so writing a commit into the queue of the ConcurrentUpdateSolrServer isn't enough to get it flushed out. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Thu, Jun 27, 2013 at 10:21 AM, qungg qzheng1...@gmail.com wrote: Hi, I'm using concurrentUpdateSolrServer to do my incremental indexing nightly. I have 50 shards to index into, about 10,000 documents each night. I start one concurrentUpdateSolrServer on each shards and start to send documents. The queue size for concurrentUpdateSolrServer is 100, and 4 threads. At the end of the import, i will send commit using the same concurrentUpdateSolrServer. The problem is some of the concurrentUpdateSolrServer is not sending the commit to the shards and the import task hangs for a couple hours. So I looked at the log and find out that the shards received about 1000 document couple hours later following with a commit. Is there anything methods I can call to flush out documents before I send the commit? Or are there any existing issue related to concurrentUpdateSolrServer related to this? Thanks, Qun -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-hanging-tp4073620.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuring Solr to retrieve documents?
Hi, I haven't used it yet, but I believe you can do this using the FileDataSource feature of DataImportHandler: http://wiki.apache.org/solr/DataImportHandler#FileDataSource HTH, Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions w: appinions.com http://www.appinions.com/ On Wed, Jun 26, 2013 at 2:12 PM, aspielman aspiel...@gmail.com wrote: Is it possible to to configure Solr to automatically grab documents in a specidfied directory, with having to use the post command? I've not found any way to do this, though admittedly, I'm not terribly experienced with config files of this type. Thanks! - | A.Spielman | In theory there is no difference between theory and practice. In practice there is. - Chuck Reid -- View this message in context: http://lucene.472066.n3.nabble.com/Configuring-Solr-to-retrieve-documents-tp4073372.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ConcurrentUpdateSolrServer hanging
Hi Michael, I realized that I might have to use blockUntilFinished before commit, but do I have to use shutdown as well?? Thanks, Qun -- View this message in context: http://lucene.472066.n3.nabble.com/ConcurrentUpdateSolrServer-hanging-tp4073620p4073651.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.DirectUpdateHandler2 failed to instantiate
Jack, Did you ever find a fix for this? I'm having similar issues (different parts of solrconfig) and my guess is it's a config issue somewhere, vs. a proper casting problem, some nested init issue. Was curious what you found? On Mar 13, 2013, at 11:52 AM, Jack Park jackp...@topicquests.org wrote: I can safely say that it is not DirectUpdateHandler2 failing; By commenting out my own handlers, the system boots without error. This means that my handlers are problematic in some way. The moment I put back just one of my handlers: updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain requestHandler name=/update/harvest class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainharvest/str /lst /requestHandler The problem returns. It simply appears that I cannot declare a named requestHandler using that class. Jack On Tue, Mar 12, 2013 at 12:22 PM, Jack Park jackp...@topicquests.org wrote: Indeed! Perhaps the germane part is this, before the failure to instantiate notice: Caused by: java.lang.ClassCastException: class org.apache.solr.update.DirectUpda teHandler2 at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader. java:432) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507) This suggests that I might be doing something wrong elsewhere in solrconfig.xml. The possibly relevant parts (my contributions) are these: updateRequestProcessorChain name=partial default=true processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain requestHandler name=/update/harvest class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainharvest/str /lst /requestHandler requestHandler name=/update/partial class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainpartial/str /lst /requestHandler Thanks Jack On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There should be a stack trace - also, you shouldn't have to do anything special to use this class. It's the default and only truly supported implementation… - Mark On Mar 12, 2013, at 2:53 PM, Jack Park jackp...@topicquests.org wrote: That messages gives great, but terrible google. Zillions of hits, mostly filled with very long log traces, and zero messages (that I could find) about what to do about it. I switched over to using that handler since it has an update log specified, and that's the only place I've found how to use update log. But, can't boot now. All the jars are in place; I'm able to import that class in my code. Is there any news on that issue? Many thanks Jack FLAGS ()
Re: solrj indexing using embedded solr is slow
On 6/27/2013 9:19 AM, Learner wrote: I was using ConcurrentUpdateSOLR for indexing documents to Solr. Later I had a need to do portable indexing hence started using Embedded solr server. I created a multithreaded program to create /submit the documents in batch of 100 to Embedded SOLR server (running inside Solrj indexing process) but for some reason it takes more time to index the data when compared with ConcurrentUpdateSOLR server(CUSS). I was under assumption that embedded server would take less time compared to http update (made when using CUSS) but not sure why it takes more time... Is there a way to speed up the indexing when using Embedded solr serveretc..(something like specifying thread and queue size similar to CUSS)? A lot more time has been spent optimizing the traditional Solr server model than the embedded version. If you want the same performance from Embedded that you get from Concurrent, you'll need to use that object in multiple threads that you create yourself. The Concurrent object handles all that threading for you, but due to its nature, Embedded can't. You say that your program is multithreaded, so I really don't know what's going on here. An FYI that on something that might have escaped your awareness: CUSS swallows exceptions - it will never inform the calling application about errors that occur, unless you override its handleError method in some way, and I don't know what is required to make it do that. This is part of why CUSS is so fast - it returns to the calling application *immediately*, no matter what actually happens in the background while talking to the server. Thanks, Shawn
Re: ConcurrentUpdateSolrServer hanging
On 6/27/2013 9:32 AM, Michael Della Bitta wrote: Are you using blockUntilFinished() and/or shutdown()? One of the things to note is that a commit is just another document, so writing a commit into the queue of the ConcurrentUpdateSolrServer isn't enough to get it flushed out. ConcurrentUpdateSolrServer contains this little bit of code: // this happens for commit... if (req.getDocuments() == null || req.getDocuments().isEmpty()) { blockUntilFinished(); return server.request(request); } Unless the comment is incorrect or there's a bug, sending a commit() will inherently do the blockUntilFinished(). Thanks, Shawn
Re: URL search and indexing
Right, string fields are a little tricky, they're easy to confuse with fields that actually _do_ something. By default, norms and term frequencies are turned off for types based on ' class=solr.StrField '. So any field length normalization (i.e. terms that appear in shorter fields count more) and term frequencies calculations are _not_ include in the score calculation. Try blowing your index away and adding this to your fields to see the difference omitNorms=false omitTermFreqAndPositions=false You probably want to either turn these on explicitly for your string types or use a type based on 'class=solr.TextField ' since these options default to false for text fields. If you use something like keywordTokenizerFactory you also won't get your URL split up into pieces. And in that case you can also normalize the values with something like lowerCaseFilter which you can't do with string types since they're completely unanalyzed. Best Erick On Wed, Jun 26, 2013 at 11:34 AM, Flavio Pompermaier pomperma...@okkam.itwrote: Obviously I messed up with email thread...however I found a problem indexing my document via post.sh. This is basically my schema.xml: schema name=dopa-schema version=1.5 fields field name=url type=string indexed=true stored=true required=true multiValued=false / field name=itemid type=string indexed=true stored=true multiValued=true/ field name=_version_ type=long indexed=true stored=true/ /fields uniqueKeyurl/uniqueKey types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ /types /schema and this is the document I tried to upload via post.sh: add doc field name=urlhttp://test.example.org/first.html/field field name=itemid1000/field field name=itemid1000/field field name=itemid1000/field field name=itemid5000/field /doc doc field name=urlhttp://test.example.org/second.html/field field name=itemid1000/field field name=itemid5000/field /doc /add When playing with administration and debugging tools I discovered that searching for q=itemid:5000 gave me the same score for those docs, while I was expecting different term frequencies between the first and the second. In fact, using java to upload documents lead to correct results (3 occurrences of item 1000 in the first doc and 1 in the second), e.g.: document1.addField(itemid, 1000); document1.addField(itemid, 1000); document1.addField(itemid, 1000); Am I right or am I missing something else? On Wed, Jun 26, 2013 at 5:18 PM, Jack Krupansky j...@basetechnology.com wrote: If there is a bug... we should identify it. What's a sample post command that you issued? -- Jack Krupansky -Original Message- From: Flavio Pompermaier Sent: Wednesday, June 26, 2013 10:53 AM To: solr-user@lucene.apache.org Subject: Re: URL search and indexing I was doing exactly that and, thanks to the administration page and explanation/debugging, I checked if results were those expected. Unfortunately, results were not correct submitting updates trough post.sh script (that use curl in the end). Probably, if it founds the same tag (same value for the same field-name), it will collapse them. Rewriting the same document in Java and submitting the updates did the things work correctly. In my opinion this is a bug (of the entire process, then I don't know it this is a problem of curl or of the script itself). Best, Flavio On Wed, Jun 26, 2013 at 4:18 PM, Erick Erickson erickerick...@gmail.com * *wrote: Flavio: You mention that you're new to Solr, so I thought I'd make sure you know that the admin/analysis page is your friend! I flat guarantee that as you try to index/search following the suggestions you'll scratch your head at your results and you'll discover that the analysis process isn't doing quite what you expect. The admin/analysis page shows you the transformation of the input at each stage, i.e. how the input is tokenized, what transformations are applied to each token etc. It's invaluable! Best Erick P.S. Feel free to un-check the verbose box, it provides lots of information but can be overwhelming, especially at first! On Wed, Jun 26, 2013 at 12:20 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Ok thank you all for the great help! Now I'm ready to start playing with my index! Best, Flavio On Tue, Jun 25, 2013 at 11:40 PM, Jack Krupansky j...@basetechnology.comwrote: Yeah, URL Classify does only do so much. That's why you need to combine multiple methods. As a fourth method, you could code up a short JavaScript ** StatelessScriptUpdateProcessor that did something like take a full domain name (such as output by URL Classify) and turn it into multiple values, each with more of the prefix removed, so that
Field Query After Collapse.Field?
Hello, I've struggling to find a way to query after collapse.field is performed and I'm hoping someone can help. I'm doing a multiple core(index) search which generates results that can have varying fields. ex. entry_id, entry_starred entry_id, entry_read I perform a collapse.field on entry_id which yields: ex. entry_id, entry_starred, entry_read But if I try to do a fq on one of the fields ex. fq=!entry_read:1 The fq is performed before the collapse leading to incorrect results. Is there anyway to perform the field query after the results are collapsed? Thanks, slevytam -- View this message in context: http://lucene.472066.n3.nabble.com/Field-Query-After-Collapse-Field-tp4073691.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: StatsComponent doesn't work if field's type is TextField - can I change field's type to String
I stand corrected, you're absolutely right about string types. But I still don't think text types are supported, at least in my quick test of the stock Solr distro, trying to gather stats on the subject field produced the error below. Note that string is a completely unanalyzed type, no tokenization etc. so it's actually a different beast than text types. Field type text_general{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={class=solr.TextField, positionIncrementGap=100}} is not currently supported On Wed, Jun 26, 2013 at 11:37 AM, Elran Dvir elr...@checkpoint.com wrote: Erick, thanks for the response. I think the stats component works with strings. In StatsValuesFactory, I see the following code: public static StatsValues createStatsValues(SchemaField sf) { ... else if (StrField.class.isInstance(fieldType)) { return new StringStatsValues(sf); } } -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, June 26, 2013 5:30 PM To: solr-user@lucene.apache.org Subject: Re: StatsComponent doesn't work if field's type is TextField - can I change field's type to String From the stats component page: The stats component returns simple statistics for indexed numeric fields within the DocSet So string, text, anything non-numeric won't work. You can declare it multiValued but then you have to add multiple values for the field when you send the doc to Solr or implement a custom update component to break them up. At least there's no filter that I know of that takes a delimited set of numbers and transforms them. FWIW, Erick On Wed, Jun 26, 2013 at 4:14 AM, Elran Dvir elr...@checkpoint.com wrote: Hi all, StatsComponent doesn't work if field's type is TextField. I get the following message: Field type textstring{class=org.apache.solr.schema.TextField,analyzer=org.apache. solr.analysis.TokenizerChain,args={positionIncrementGap=100, sortMissingLast=true}} is not currently supported. My field configuration is: fieldType name=mvstring class=solr.TextField positionIncrementGap= 100 sortMissingLast=true analyzer type=index tokenizer class=solr.PatternTokenizerFactory pattern=\n / /analyzer /fieldType field name=myField type=mvstring indexed=true stored=false multiValued=true/ So, the reason my field is of type TextField is that in the document indexed there may be multiple values in the field separated by new lines. The tokenizer is splitting it to multiple values and the field is indexed as multi-valued field. Is there a way I can define the field as regular String field? Or a way to make StatsComponent work with TextField? Thank you very much. Email secured by Check Point
Re: Querying multiple collections in SolrCloud
I'd _guess_ that this is unsupported across collections if for no other reason than scores really aren't comparable across collections and the default ordering within groups is score. This is really a federated search type problem. But if it makes sense to use N collections for other reasons, it's really the same thing as grouping functionally, you just send a separate request to each collection and combine the results of those N requests rather than from N groups in a single query. If the collections are hosted on different machines for instance, you might get quicker overall response by firing off parallel queries, It Depends (tm)... Best Erick On Wed, Jun 26, 2013 at 1:46 PM, Chris Toomey ctoo...@gmail.com wrote: Thanks Erick, that's a very helpful answer. Regarding the grouping option, does that require all the docs to be put into a single collection, or could it be done with across N collections (assuming each collection had a common type field for grouping on)? Chris On Wed, Jun 26, 2013 at 7:01 AM, Erick Erickson erickerick...@gmail.com wrote: bq: Would the above setup qualify as multiple compatible collections No. While there may be enough fields in common to form a single query, the TF/IDF calculations will not be compatible and the scores from the various collections will NOT be comparable. So simply getting the list of top N docs will probably be dominated by the docs from a single type. bq: How does SolrCloud combine the query results from multiple collections? It doesn't. SolrCloud sorts the results from multiple nodes in the _same_ collection according to whatever sort criteria are specified, defaulting to score. Say you ask for the top 20 docs. A node from each shard returns the top 20 docs for that shard. The node processing them just merges all the returned lists and only keeps the top 20. I don't think your last two questions are really relevant, SolrCloud isn't built to query multiple collections and return the results coherently. The root problem here is that you're trying to compare docs from different collections for goodness to return the top N. This isn't actually hard _except_ when goodness is the score, then it just doesn't work. You can't even compare scores from different queries on the _same_ collection, much less different ones. Consider two collections, books and songs. One consists of lots and lots of text and the ter frequency and inverse doc freq (TF/IDF) will be hugely different than songs. Not to mention field length normalization. Now, all that aside there's an option. Index all the docs in a single collection and use grouping (aka field collapsing) to get a single response that has the top N docs from each type (they'll be in different sections of the original response) and present them to the user however makes sense. You'll get hands on experience in why this isn't something that's easy to do automatically if you try to sort these into a single list by relevance G... Best Erick On Tue, Jun 25, 2013 at 3:35 PM, Chris Toomey ctoo...@gmail.com wrote: Thanks Jack for the alternatives. The first is interesting but has the downside of requiring multiple queries to get the full matching docs. The second is interesting and very simple, but has the downside of not being modular and being difficult to configure field boosting when the collections have overlapping field names with different boosts being needed for the same field in different document types. I'd still like to know about the viability of my original approach though too. Chris On Tue, Jun 25, 2013 at 3:19 PM, Jack Krupansky j...@basetechnology.com wrote: One simple scenario to consider: N+1 collections - one collection per document type with detailed fields for that document type, and one common collection that indexes a subset of the fields. The main user query would be an edismax over the common fields in that main collection. You can then display summary results from the common collection. You can also then support drill down into the type-specific collection based on a type field for each document in the main collection. Or, sure, you actually CAN index multiple document types in the same collection - add all the fields to one schema - there is no time or space penalty if most of the field are empty for most documents. -- Jack Krupansky -Original Message- From: Chris Toomey Sent: Tuesday, June 25, 2013 6:08 PM To: solr-user@lucene.apache.org Subject: Querying multiple collections in SolrCloud Hi, I'm investigating using SolrCloud for querying documents of different but similar/related types, and have read through docs. on the wiki and done many searches in these archives, but still have some questions. Thanks in advance for your help.
Change of email
Dear List Managers I've changed my email that I'd like to use for the solr-user list, as it's filling up my work email to the point of insanity. Regardless of the change in the solr-user community, it still keeps sending the emails of all threads and replies to my work email. Would you please be so kind to affect this change for me? The new email is a yahoo email, and is already showing in my preferences Thank you kindly Anria
Re: Replicating files containing external file fields
Haven't tried this, but I _think_ you can use the confFiles trick with relative paths, see: http://wiki.apache.org/solr/SolrReplication Or just put your EFF files in the data dir? Best Erick On Wed, Jun 26, 2013 at 9:01 PM, Arun Rangarajan arunrangara...@gmail.comwrote: From https://wiki.apache.org/solr/SolrReplication I understand that index dir and any files under the conf dir can be replicated to slaves. I want to know if there is any way the files under the data dir containing external file fields can be replicated. These are not replicated by default. Currently we are running the ext file field reload script on both the master and the slave and then running reloadCache on each server once they are loaded.
Re: solrj indexing using embedded solr is slow
Shawn, Thanks a lot for your reply. I have pasted my entire code below, it would be great if you can let me know if I am doing anything wrong in terms of running the code in multithreaded environment. http://pastebin.com/WRLn3yWn -- View this message in context: http://lucene.472066.n3.nabble.com/solrj-indexing-using-embedded-solr-is-slow-tp4073636p4073711.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Change of email
On Thu, Jun 27, 2013, at 06:48 PM, abillav...@innoventsolutions.com wrote: Dear List Managers I've changed my email that I'd like to use for the solr-user list, as it's filling up my work email to the point of insanity. Regardless of the change in the solr-user community, it still keeps sending the emails of all threads and replies to my work email. Would you please be so kind to affect this change for me? The new email is a yahoo email, and is already showing in my preferences Simply unsubscribe yourself (mail solr-user-unsubscr...@lucene.apache.org) from your work address. Then subscribe from the new address. If you have difficulties with unsubscribing, then a mail administrator can help you sort it. Upayavira
Querying across multiple *identical* Collections
Hi, This search across multiple collections question has come up a few times recently: http://search-lucene.com/m/2Q1BE0IT4Y/subj=Search+across+multiple+collections http://search-lucene.com/m/5JQrXIyhQQ1/subj=Querying+multiple+collections+in+SolrCloud One important variation of this Q is - can one search across MULTIPLE IDENTICAL collections. The use case is that you need to index/archive a lot of data, but because your searches have a time range filter, instead of having 1 massive Collection you have to search, you really want to have N smaller Collection, say weekly, so you can search smaller Collection(s). For example: A query that limits matches to docs from only the last 48 hours can be routed only to the Collection for the latest/current week. If the time range filter needs data from multiple Collections (e.g. it's for the last 10 days and we have weekly collections), then IDEALLY, you want to be able to send ONE request to Solr and specify 2 Collections to search and have Solr handle calling each Collection and merging. Yes, in case of full-text search global IDF would ideally be used, but Solr is increasingly used for analytical queries and not just full-text queries, and one doesn't need global IDF for that. So: Can one query *multiple identical* Collections with one request from the client? If not: should I open a new JIRA issue? I see https://issues.apache.org/jira/browse/SOLR-4497 allows aliasing multiple Collections, which covers the use-case where you know which Collections might be queried. But in some cases you don't know that ahead of time, so you can't prepare all the aliases. In that case you wold want to be able to list all Collections to search in the request and that's it. Maybe this is already doable? Thanks, Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm
Re: solr.DirectUpdateHandler2 failed to instantiate
Wow! That's been a while back, and it appears that my journal didn't carry a good trace of what I did. Here's a reconstruction: From my earlier attempt, which is reflected in this solrconfig.xml entry requestHandler name=/update/harvest class=solr.DirectUpdateHandler2 notice that I am calling solrDirectUpdateHandler2 directly in defining a requestHandler I don't do that anymore. Now, it's this: updateRequestProcessorChain name=harvest default=true which took a lot of fishing to sort out, because, being somewhat dyslexic, it took a long time to figure out that I can use harvest as a setting in SolrJ, thus: harvestServer = new HttpSolrServer(solrURL); harvestServer.getHttpClient().getParams().setParameter(update.chain, harvest); In short, the original exception was based on a gross misinterpretation of how one goes about equating solrconfig.xml with configurations of SolrJ. Hope that helps more than it confuses! Cheers Jack On Thu, Jun 27, 2013 at 9:45 AM, Mark Bennett mark.benn...@lucidworks.com wrote: Jack, Did you ever find a fix for this? I'm having similar issues (different parts of solrconfig) and my guess is it's a config issue somewhere, vs. a proper casting problem, some nested init issue. Was curious what you found? On Mar 13, 2013, at 11:52 AM, Jack Park jackp...@topicquests.org wrote: I can safely say that it is not DirectUpdateHandler2 failing; By commenting out my own handlers, the system boots without error. This means that my handlers are problematic in some way. The moment I put back just one of my handlers: updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain requestHandler name=/update/harvest class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainharvest/str /lst /requestHandler The problem returns. It simply appears that I cannot declare a named requestHandler using that class. Jack On Tue, Mar 12, 2013 at 12:22 PM, Jack Park jackp...@topicquests.org wrote: Indeed! Perhaps the germane part is this, before the failure to instantiate notice: Caused by: java.lang.ClassCastException: class org.apache.solr.update.DirectUpda teHandler2 at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader. java:432) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507) This suggests that I might be doing something wrong elsewhere in solrconfig.xml. The possibly relevant parts (my contributions) are these: updateRequestProcessorChain name=partial default=true processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain requestHandler name=/update/harvest class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainharvest/str /lst /requestHandler requestHandler name=/update/partial class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainpartial/str /lst /requestHandler Thanks Jack On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There should be a stack trace - also, you shouldn't have to do anything special to use this class. It's the default and only truly supported implementation… - Mark On Mar 12, 2013, at 2:53 PM, Jack Park jackp...@topicquests.org wrote: That messages gives great, but terrible google. Zillions of hits, mostly filled with very long log traces, and zero messages (that I could find) about what to do about it. I switched over to using that handler since it has an update log specified, and that's the only place I've found how to use update log. But, can't boot now. All the jars are in place; I'm able to import that class in my code. Is there any news on that issue? Many thanks Jack FLAGS ()
state of new config format in 4.3.1
Can anyone (Eric?) outline what's changing between 4.3.1 and 4.4 wrt http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond, and what makes the new solr.xml format usable in 4.4 but not 4.3.1? If one didn't care about sharedLib or solr.xml persistence (the only solr.xml changes we care about are addition of core's via the SolrCloud API, so if that happens with core-discovery we're good) -- is there any reason to not use the new format?
Re: solr.DirectUpdateHandler2 failed to instantiate
For the record, in case anybody else hits this, I think the ClassCastException problem had to do with which class loader first loads the class, which is a side affect of which directory(ies!) you put the jar file in. I can't reproduce the problem any more, but I believe it went away when I removed copies of my jar from other lib directories which I had been experimenting with. -- Mark Bennett / LucidWorks: Search Big Data / mark.benn...@lucidworks.com Office: 408-898-4201 / Telecommute: 408-733-0387 / Cell: 408-829-6513 On Mar 13, 2013, at 11:52 AM, Jack Park jackp...@topicquests.org wrote: I can safely say that it is not DirectUpdateHandler2 failing; By commenting out my own handlers, the system boots without error. This means that my handlers are problematic in some way. The moment I put back just one of my handlers: updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain requestHandler name=/update/harvest class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainharvest/str /lst /requestHandler The problem returns. It simply appears that I cannot declare a named requestHandler using that class. Jack On Tue, Mar 12, 2013 at 12:22 PM, Jack Park jackp...@topicquests.org wrote: Indeed! Perhaps the germane part is this, before the failure to instantiate notice: Caused by: java.lang.ClassCastException: class org.apache.solr.update.DirectUpda teHandler2 at java.lang.Class.asSubclass(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader. java:432) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:507) This suggests that I might be doing something wrong elsewhere in solrconfig.xml. The possibly relevant parts (my contributions) are these: updateRequestProcessorChain name=partial default=true processor class=solr.RunUpdateProcessorFactory/ processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain updateRequestProcessorChain name=harvest default=true processor class=solr.RunUpdateProcessorFactory/ processor class=org.apache.solr.update.TopicQuestsDocumentProcessFactory str name=inputFieldhello/str /processor processor class=solr.LogUpdateProcessorFactory/ /updateRequestProcessorChain requestHandler name=/update/harvest class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainharvest/str /lst /requestHandler requestHandler name=/update/partial class=solr.DirectUpdateHandler2 lst name=defaults str name=update.chainpartial/str /lst /requestHandler Thanks Jack On Tue, Mar 12, 2013 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There should be a stack trace - also, you shouldn't have to do anything special to use this class. It's the default and only truly supported implementation… - Mark On Mar 12, 2013, at 2:53 PM, Jack Park jackp...@topicquests.org wrote: That messages gives great, but terrible google. Zillions of hits, mostly filled with very long log traces, and zero messages (that I could find) about what to do about it. I switched over to using that handler since it has an update log specified, and that's the only place I've found how to use update log. But, can't boot now. All the jars are in place; I'm able to import that class in my code. Is there any news on that issue? Many thanks Jack FLAGS ()
Re: Querying across multiple *identical* Collections
http://wiki.apache.org/solr/SolrCloud#Distributed_Requests - Mark On Jun 27, 2013, at 2:34 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, This search across multiple collections question has come up a few times recently: http://search-lucene.com/m/2Q1BE0IT4Y/subj=Search+across+multiple+collections http://search-lucene.com/m/5JQrXIyhQQ1/subj=Querying+multiple+collections+in+SolrCloud One important variation of this Q is - can one search across MULTIPLE IDENTICAL collections. The use case is that you need to index/archive a lot of data, but because your searches have a time range filter, instead of having 1 massive Collection you have to search, you really want to have N smaller Collection, say weekly, so you can search smaller Collection(s). For example: A query that limits matches to docs from only the last 48 hours can be routed only to the Collection for the latest/current week. If the time range filter needs data from multiple Collections (e.g. it's for the last 10 days and we have weekly collections), then IDEALLY, you want to be able to send ONE request to Solr and specify 2 Collections to search and have Solr handle calling each Collection and merging. Yes, in case of full-text search global IDF would ideally be used, but Solr is increasingly used for analytical queries and not just full-text queries, and one doesn't need global IDF for that. So: Can one query *multiple identical* Collections with one request from the client? If not: should I open a new JIRA issue? I see https://issues.apache.org/jira/browse/SOLR-4497 allows aliasing multiple Collections, which covers the use-case where you know which Collections might be queried. But in some cases you don't know that ahead of time, so you can't prepare all the aliases. In that case you wold want to be able to list all Collections to search in the request and that's it. Maybe this is already doable? Thanks, Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm
Re: Searching and Retrieving Information Protocol For Solr
HTTP? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Thu, Jun 27, 2013 at 7:40 AM, Furkan KAMACI furkankam...@gmail.com wrote: There is a low level protocol that defines client–server protocol for searching and retrieving information from remote computer databases called as Z39.50. Due to Solr is a commonly used search engine (beside being a NoSQL database) is there any protocol for (I don't mean a low level protocol, z39.50 is just an example) Solr that it can integrate with other clients or anything else?
Re: state of new config format in 4.3.1
There were a variety of little bugs - it will just be a bit of a land mine situation if you try and do it with 4.3.1. If it ends up working for you, that's that. - Mark On Jun 27, 2013, at 3:22 PM, shikhar shik...@schmizz.net wrote: Can anyone (Eric?) outline what's changing between 4.3.1 and 4.4 wrt http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond, and what makes the new solr.xml format usable in 4.4 but not 4.3.1? If one didn't care about sharedLib or solr.xml persistence (the only solr.xml changes we care about are addition of core's via the SolrCloud API, so if that happens with core-discovery we're good) -- is there any reason to not use the new format?
RE: shardkey
Hi, We finally decided on using custom sharding (implicit document routing) for our project. We will have ~3 mil documents per shardkey. We're maintaining shardkey - shardid mapping in a database table. While adding documents we always specify _shard_ parameter in update URL but while querying, we don't specify shards parameter. We want to search across shards. While experimenting we found that right after hard committing (commit=true in update URL), at times the query didn't return documents across shards (40% of the time) But many times (60% of the time) it returned documents across shards. When queried after few hours, the query always returned documents across shards. Is that expected behavior? Is there a parameter to enforce querying across all shards? This is very important point for us to move further with SolrCloud. We're experimenting with adding a new shard and start directing all new documents to this new shard. Hopefully that should work. Many Thanks! -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, June 21, 2013 8:50 PM To: solr-user@lucene.apache.org Subject: Re: shardkey On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital shital.jo...@gs.com wrote: But now Solr stores composite id in the document id Correct, it's the document id itself that contains everything needed for tje compositeId router to determine the hash. It would only use it to calculate hash key but while storing compositeId routing is when it makes sense to make the routing part of the unique id so that an id is all the information needed to find the document in the cluster. For example customer_id!document_name. From your example of 20130611!test_14 it looks like you're doing time based sharding, and one would normally not use the compositeId router for that. -Yonik http://lucidworks.com
Re: Is Overlapping onDeckSearchers=2 really a problem?
Shawn, On Thu, Jun 27, 2013 at 5:03 PM, Shawn Heisey s...@elyograg.org wrote: On 6/27/2013 5:59 AM, Robert Krüger wrote: sometime forcing oneself to describe a problem is the first step to a solution. I just realized that I also had an autocommit statement in my config with the exact same amount of time the seemed to be between the warnings. I removed that, because I don't think I really need it, and now the warnings are gone. So it seems it happened whenever my manual commits overlapped with an autocommit, which, of course, was more likely when many commits were issued in sequence. If all you are doing is soft commits, your transaction logs are going to grow out of control. you are absolutely right. I was shooting myself in the foot with that change. http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup My recommendation: 1) Remove all commits from your indexing application. 2) Configure autoCommit with values similar to that wiki page. 3) Configure autoSoftCommit to happen often. The autoCommit must have openSearcher set to false. For autoSoftCommit, include a maxTime between 1000 and 5000 (milliseconds) and leave maxDocs out. I did that but without autoSoftCommit because I need control over when the commits happen and soft-commit in my application. Thank you so much, Robert
Normalizing/Returning solr scores between 0 to 1
Hi, We have a need where we would want normalized scores from score ranging between 0 to 1 rather than a free range. I read about it @ http://wiki.apache.org/lucene-java/ScoresAsPercentages and seems like thats not something that is recommended. However, is there still a way to set some config in solrconfig to make sure scores are always between 0 to 1? OR i will have to implement that logic in my code after I get the results from Solr. Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shardkey
You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ? The commit true part of the request that add documents? If so, it might be SOLR-4923 and you should try the commit in a request after adding the docs. - Mark On Jun 27, 2013, at 4:42 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We finally decided on using custom sharding (implicit document routing) for our project. We will have ~3 mil documents per shardkey. We're maintaining shardkey - shardid mapping in a database table. While adding documents we always specify _shard_ parameter in update URL but while querying, we don't specify shards parameter. We want to search across shards. While experimenting we found that right after hard committing (commit=true in update URL), at times the query didn't return documents across shards (40% of the time) But many times (60% of the time) it returned documents across shards. When queried after few hours, the query always returned documents across shards. Is that expected behavior? Is there a parameter to enforce querying across all shards? This is very important point for us to move further with SolrCloud. We're experimenting with adding a new shard and start directing all new documents to this new shard. Hopefully that should work. Many Thanks! -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Friday, June 21, 2013 8:50 PM To: solr-user@lucene.apache.org Subject: Re: shardkey On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital shital.jo...@gs.com wrote: But now Solr stores composite id in the document id Correct, it's the document id itself that contains everything needed for tje compositeId router to determine the hash. It would only use it to calculate hash key but while storing compositeId routing is when it makes sense to make the routing part of the unique id so that an id is all the information needed to find the document in the cluster. For example customer_id!document_name. From your example of 20130611!test_14 it looks like you're doing time based sharding, and one would normally not use the compositeId router for that. -Yonik http://lucidworks.com
Re: Why there is no getter method for defaultCollection at CloudSolrServer?
I've created a JIRA and applied a patch for it: https://issues.apache.org/jira/browse/SOLR-4973 2013/6/12 Furkan KAMACI furkankam...@gmail.com Ok, I will create a JIRA for it. 2013/6/11 Mark Miller markrmil...@gmail.com On Jun 11, 2013, at 4:51 AM, Furkan KAMACI furkankam...@gmail.com wrote: Why there is no getter method for defaultCollection at CloudSolrServer? Want to create a JIRA issue to add it? - Mark
full-import failed after 5 hours with Exception: ORA-01555: snapshot too old: rollback segment number with name too small ORA-22924: snapshot too old
Hello, I am using Solr 4.3.2 and Oracle DB. The sub entity is using CachedSqlEntityProcessor. The dataSource is having batchSize=500. The full-import is failed with 'ORA-01555: snapshot too old: rollback segment number with name too small ORA-22924: snapshot too old' Exception after 5 hours. We already increased the undo space 4 times at the database end. Number of records in the jan_story table is 800,000 only. Tomcat is with 4GB JVM memory. Following is the entity (there are other sub-entities, I didn't mention them here. As the import failed with article_details entity. article_details is the first sub-entity) entity name=par8-article-testingprod dataSource=par8_prod pk=VCMID preImportDeleteQuery=content_type:article AND repository:par8qatestingprod query=select ID as VCMID from jan_story entity name=article_details dataSource=par8_prod transformer=TemplateTransformer,ClobTransformer,RegexTransformer query=select bb.recordid, aa.ID as DID,aa.STORY_TITLE, aa.STORY_HEADLINE, aa.SOURCE, aa.DECK, regexp_replace(aa.body, '\lt;p\gt;\[(pullquote|summary)\]\lt;/p\gt;|\[video [0-9]+?\]|\[youtube .+?\]', '') as BODY, aa.PUBLISHED_DATE, aa.MODIFIED_DATE, aa.DATELINE, aa.REPORTER_NAME, aa.TICKER_CODES,aa.ADVERTORIAL_CONTENT from jan_story aa,mapp bb where aa.id=bb.keystring1 cacheKey=DID cacheLookup=par8-article-testingprod.VCMID processor=CachedSqlEntityProcessor field column=content_type template=article / field column=RECORDID name=native_id / field column=repository template=par8qatestingprod / field column=STORY_TITLE name=title / field column=DECK name=description clob=true / field column=PUBLISHED_DATE name=date / field column=MODIFIED_DATE name=last_modified_date / field column=BODY name=body clob=true / field column=SOURCE name=source / field column=DATELINE name=dateline / field column=STORY_HEADLINE name=export_headline / /entity /entity The full-import without CachedSqlEntityProcessor is taking 7 days. That is why I am doing all this. -- View this message in context: http://lucene.472066.n3.nabble.com/full-import-failed-after-5-hours-with-Exception-ORA-01555-snapshot-too-old-rollback-segment-number-wd-tp4073822.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Normalizing/Returning solr scores between 0 to 1
There is no way that I am of aware of to have Solr return between 0 and a 1. Perhaps there is some way tom implement a custom Scorer, but that is overkill and would probably have adverse affects. Instead, just normalize it in your results. Of course, since you read the link you included, you realize that it is no longer really a score, but basically just a feel good measure. And that is what we do, along with some other logic. -Kevin On Thu, Jun 27, 2013 at 2:25 PM, smanad sma...@gmail.com wrote: Hi, We have a need where we would want normalized scores from score ranging between 0 to 1 rather than a free range. I read about it @ http://wiki.apache.org/lucene-java/ScoresAsPercentagesand seems like thats not something that is recommended. However, is there still a way to set some config in solrconfig to make sure scores are always between 0 to 1? OR i will have to implement that logic in my code after I get the results from Solr. Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797.html Sent from the Solr - User mailing list archive at Nabble.com. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]
Re: Normalizing/Returning solr scores between 0 to 1
Might not be useful but a work around would be to divide all scores by max score to get scores between 0 and 1. -- View this message in context: http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797p4073829.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to build indexes using SOLRJ without SOLR instance?
Thanks a lot for your response. I created a multithreaded program to create /submit the documents in batch of 100 to Embedded SOLR server but for some reason it takes more time to index the data when compared with ConcurrentUpdateeSOLR server. I was under assumption that embedded server would take less time compared to http calls but not sure why it takes more time... Is there a way to speed up the indexing by increasing queue size etc..(something similar to concurrent update SOLR server)? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-build-indexes-using-SOLRJ-without-SOLR-instance-tp4073383p4073509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: state of new config format in 4.3.1
Thanks Mark, might give it a go, or probably just wait for 4.4 :) On Thu, Jun 27, 2013 at 4:06 PM, Mark Miller markrmil...@gmail.com wrote: There were a variety of little bugs - it will just be a bit of a land mine situation if you try and do it with 4.3.1. If it ends up working for you, that's that. - Mark On Jun 27, 2013, at 3:22 PM, shikhar shik...@schmizz.net wrote: Can anyone (Eric?) outline what's changing between 4.3.1 and 4.4 wrt http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond, and what makes the new solr.xml format usable in 4.4 but not 4.3.1? If one didn't care about sharedLib or solr.xml persistence (the only solr.xml changes we care about are addition of core's via the SolrCloud API, so if that happens with core-discovery we're good) -- is there any reason to not use the new format?
Question on forming query when using switch parser plugin?
Hi, I currently have a query as below. I am using the fq only if latlong value (using switch plugin) is not empty else I am not using fq at all. Whenever latlong value is empty, I just use value of $where (in q) parameter to return the results based on location. Now whenever latlong value is available I need to use both $where as well as the values returned by $lallong (geo spatial search). Currently the values first get filtered based on 'q' first and then the values are passed to fq hence the values returned are always subset of the values returned by 'q'. I need 'q' to boost the score of the documents. Can someone let me know how to return the values corresponding to fq (without getting filtered by 'q')? Example: If I search for a place like Charlotte, NC (by passing the latitude and longitude with distance of 20 miles), I get only the results belonging to Charlotte, NC when I use the below query. I need to return all the results based on distance. If I dont pass the latitude and longitude but rather if I just pass Charlotte, geo spatial function won't kick in hence the results will be just based on $where value in 'q'. lst name=defaults str name=q ( _query_:{!cust1 qf=person_name_lname_i v=$lname}^8.3 OR _query_:{!cust1 qf=person_name_lname_phonetic_i v=$lname}^8.6 ) ( _query_:{!cust df='addr_location_clean_i' qs=1 v=$where}^6.2 OR _query_:{!cust df='addr_location_i' qs=1 v=$where}^6.2 ) /str/lst lst name=appends str name=fq{!switch case='*:*' default=$fq_bbox v=$latlong}/str /lst lst name=invariants str name=fq_bbox_query_:{!bbox pt=$latlong sfield=geo d=$dist}^0.2/str /lst -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-forming-query-when-using-switch-parser-plugin-tp4073847.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuring Solr to retrieve documents?
On 27 June 2013 21:13, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi, I haven't used it yet, but I believe you can do this using the FileDataSource feature of DataImportHandler: http://wiki.apache.org/solr/DataImportHandler#FileDataSource [...] Please see other recent threads on similar topics in this list: A FileDataSource is probably the way to go, along with something like the PlainTextEntityProcessor for text files, or TikaEntityProcessor for PDF/other rich-text documents. Regards, Gora