Re: solr spell correction help
k thanks jack but then why does cattle not giving kettle as suggestions?? On Fri, Apr 12, 2013 at 6:46 PM, Jack Krupansky j...@basetechnology.comwrote: blandars its not giving correction as blender They have an edit distance of 3. Direct Spell is limited to a maximum ED of 2. -- Jack Krupansky -Original Message- From: Rohan Thakur Sent: Friday, April 12, 2013 8:45 AM To: solr-user@lucene.apache.org Subject: solr spell correction help hi all I have configured solr direct spell correction on spell field most of the words solr is correcting and giving suggestions but on some words like mentioned below is giving absurd results: 1) blender(indexed) 2) kettle(indexed) 3) electric(indexed) problems: 1) when I search for blandar its giving correct result as blender but when I search for blandars its not giving correction as blender 2) for this when I search for kettle the correct spell its still showing it to be false but not giving suggestions and even the results documents are showing up. and when I search for cettle its giving correct result as kettle but when I search for cattle its not giving any suggestions. 3) for this again when I search for electric the correct spell its showing it to be false in suggestions section but not giving any suggestions and documents are also returning for this spelling as its the correct one. even if I want solr to return samsung as spell suggetion if I search for sam what could be the configuration and what could be the solution for above problems? please help. thanks in advance regards Rohan
Re: solr spell correction help
but jack im not using lavanstine distance measures im using jarowinker distance On Mon, Apr 15, 2013 at 11:50 AM, Rohan Thakur rohan.i...@gmail.com wrote: k thanks jack but then why does cattle not giving kettle as suggestions?? On Fri, Apr 12, 2013 at 6:46 PM, Jack Krupansky j...@basetechnology.comwrote: blandars its not giving correction as blender They have an edit distance of 3. Direct Spell is limited to a maximum ED of 2. -- Jack Krupansky -Original Message- From: Rohan Thakur Sent: Friday, April 12, 2013 8:45 AM To: solr-user@lucene.apache.org Subject: solr spell correction help hi all I have configured solr direct spell correction on spell field most of the words solr is correcting and giving suggestions but on some words like mentioned below is giving absurd results: 1) blender(indexed) 2) kettle(indexed) 3) electric(indexed) problems: 1) when I search for blandar its giving correct result as blender but when I search for blandars its not giving correction as blender 2) for this when I search for kettle the correct spell its still showing it to be false but not giving suggestions and even the results documents are showing up. and when I search for cettle its giving correct result as kettle but when I search for cattle its not giving any suggestions. 3) for this again when I search for electric the correct spell its showing it to be false in suggestions section but not giving any suggestions and documents are also returning for this spelling as its the correct one. even if I want solr to return samsung as spell suggetion if I search for sam what could be the configuration and what could be the solution for above problems? please help. thanks in advance regards Rohan
Re: Does solr cloud support rename or swap function for collection?
I added a brief description on CREATEALIAS here, feel free to tweak: http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API Tim On 07/04/13 05:29 PM, Mark Miller wrote: It's pretty simple - just as Brad said, it's just http://localhost:8983/solr/admin/collections?action=CREATEALIASname=aliascollections=collection1,collection2,… You also have action=DELETEALIAS CREATEALIAS will create and update. For update requests, you only want a 1to1 alias. For read requests, you can map 1to1 or 1toN. I've also started work on shard level aliases, but I've yet to get back to finishing it. - Mark On Apr 7, 2013, at 5:10 PM, Tim Vaillancourtt...@elementspace.com wrote: I aim to use this feature in more in testing soon. I'll be sure to doc what I can. Cheers, Tim On 07/04/13 12:28 PM, Mark Miller wrote: On Apr 7, 2013, at 9:44 AM, bradhill99bradhil...@yahoo.com wrote: Thanks Mark for this great feature but I suggest you can update the wiki too. Yeah, I've stopped updating the wiki for a while now looking back - paralysis on how to handle versions (I didn't want to do the std 'this applies to 4.1', 'this applied to 4.0' all over the page) and the current likely move to a new Confluence wiki with Docs based on documentation LucidWorks recently donated to the project. That's all a lot of work away still I guess. I'll try and add some basic doc for this to the SolrCloud wiki page soon. - Mark
how to update document with DIH (FileDataSource)
Hi, all I am trying to index from both DB and file. and informations from DB and file make one document. so I decided update document which I have already indexed from DB. I will use DIH because of millions of files if I find how to update document with DIH. I need your help. Thanks in advance. regards
Re: Solr using a ridiculous amount of memory
On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote: Our memory requirements are running amok. We have less than a quarter of our customers running now and even though we have allocated 25GB to the JVM already, we are still seeing daily OOM crashes. Out of curiosity: Did you manage to pinpoint the memory eater in your setup? - Toke Eskildsen
Re: Solr using a ridiculous amount of memory
Yes and no, The FieldCache is the big culprit. We do a huge amount of faceting so it seems right. Unfortunately I am super swamped at work so I have precious little time to work on this, which is what explains my silence. Out of desperation, I added another 32G of memory to each server and increased the JVM size to 64G from 25G. The servers are running with 96G memory right now (this is the max amount supported by the hardware) which leaves solr somewhat starved for memory. I am aware of the performance implications of doing this but I have little choice. The extra memory helped a lot, but it still OOM with about 180 clients using it. Unfortunately I need to support at least double that. After upgrading the RAM, I ran for almost two weeks with the same workload that used to OOM a couple of times a day, so it doesn't look like a leak. Today I finally managed to set up a test core so I can begin to play around with docValues. I actually have a couple of questions regarding docValues: 1) If I facet on multible fields and only some of those fields are using docValues, will I still get the memory saving benefit of docValues? (one of the facet fields use null values and will require a lot of work in our product to fix) 2) If i just use docValues on one small core with very limited traffic at first for testing purposes, how can I test that it is actually using the disk for caching? I really appreciate all the help I have received on this list so far. I do feel confident that I will be able to solve this issue eventually. On Mon, Apr 15, 2013 at 9:00 AM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Sun, 2013-03-24 at 09:19 +0100, John Nielsen wrote: Our memory requirements are running amok. We have less than a quarter of our customers running now and even though we have allocated 25GB to the JVM already, we are still seeing daily OOM crashes. Out of curiosity: Did you manage to pinpoint the memory eater in your setup? - Toke Eskildsen -- Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk
Re: SolrCloud vs Solr master-slave replication
Hi Shawn, thank you for your reply. I'll check if network card drivers are ok. About the RAM, the JVM max heap size is currently 6GB, but it never reaches the maximum, tipically the used RAM is not more than 5GB. should I assign more RAM? I've read that excess of RAM assigned could have also a bad effect on the performance. Apart of the RAM used by JVM, the server has more than 10GB of unused RAM, which should be enough to cache the index. About SolrCloud, I know it doesn't use master-slave replication, but incremental updates, item by item. That's why I thought it could work for us, since our bottleneck appear to be the replication cycles. But another point is, if the indexing occurs in all servers, 1200 updates/min could also overload the servers? and therefore have a worst performance than with master-slave replication? Regards, Victor -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-Solr-master-slave-replication-tp4055541p4055995.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to migrate solr 1.4 index to solr 4.2 index
hi right now we have just moved 1.4 indexes to 4.2.1 and apply the test on that Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-migrate-solr-1-4-index-to-solr-4-2-index-tp4055531p4055997.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Which tokenizer or analizer should use and field type
try executing these with debug=all and examine the resulting parsed query, that'll show you exactly how the query is parsed. Also, the query language is not strictly boolean, see: http://searchhub.org/2011/12/28/why-not-and-or-and-not/ The first thing I would try would be to parenthesize explicitly as keyword:((assistant AND coach) OR (iit AND kanpur)) Best Erick On Sat, Apr 13, 2013 at 7:06 PM, anurag.jain anurag.k...@gmail.com wrote: Hi, If you can help me in. It will solve my problem. keyword:(*assistant AND coach*) giving me 1 result. keyword:(*iit AND kanpur*) giving me 2 result. But query:- keyword:(*assistant AND coach* OR (*iit AND kanpur*)) giving me only 1 result. Also i tried. keyword:(*assistant AND coach* OR (*:* *iit AND kanpur*)) giving me only 1 result. Don't know why. How query should look like ?? please help me to find out solution. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Which-tokenizer-or-analizer-should-use-and-field-type-tp4055591p4055837.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Indexing My SQL Timestamp or Date Time field
Solr requires precise date formats, see: http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/schema/DateField.html Best Erick On Sun, Apr 14, 2013 at 11:43 AM, ursswak...@gmail.com ursswak...@gmail.com wrote: Hi, To index Date in Solr, Date should be in ISO format. Can we index MySQL Timestamp or Date Time feild with out modifying SQL Select Statement ? I have used fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ field name=CreatedDate type=tdate indexed=true stored=true / CreatedDate is of Type Date Time in MySQL I am getting following exception 11:23:39,117 WARN [org.apache.solr.handler.dataimport.DateFormatTransformer] (Thread-72) Could not parse a Date field : java.text.ParseException: Unparseable date: 2013-04-14 11:22:48.0 Any help in fixing this Issue is really appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-My-SQL-Timestamp-or-Date-Time-field-tp4055894.html Sent from the Solr - User mailing list archive at Nabble.com.
Test harness can not load existing index data in Solr 4.2
I'm extending Solr's *AbstractSolrTestCase* for unit testing. I have existing 'schema.xml', 'solrconfig.xml' and index data. I want to start an embedded solr server to load existing collection and its data. Then test searching doc in solr. This way works well in Solr 3.6. However it does not work any more after adapting to Solr 4.2.1. After some investigating, I found it looks like the index data is not loaded by SolrCore created by Test harness. This also can be reproduced when using index of example doc of Solr, I posted the detail test class in my stackoverflow question[1]. Is it a bug of test harness? Or is there better way to load existing index data in unit test? Thanks. [1] http://stackoverflow.com/questions/15947116/solr-4-2-test-harness-can-not-load-existing-index-data Mengxin Zhu
Re: Solr using a ridiculous amount of memory
On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote: The FieldCache is the big culprit. We do a huge amount of faceting so it seems right. Yes, you wrote that earlier. The mystery is that the math does not check out with the description you have given us. Unfortunately I am super swamped at work so I have precious little time to work on this, which is what explains my silence. No problem, we've all been there. [Band aid: More memory] The extra memory helped a lot, but it still OOM with about 180 clients using it. You stated earlier that you has a solr cluster and your total(?) index size was 35GB, with each register being between 15k and 30k. I am using the quotes to signify that it is unclear what you mean. Is your cluster multiple machines (I'm guessing no), multiple Solr's, cores, shards or maybe just a single instance prepared for later distribution? Is a register a core, shard or a simply logical part (one client's data) of the index? If each client has their own core or shard, that would mean that each client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180 ~= 200MB of index. That sounds quite high and you would need a very heavy facet to reach that. If you could grep UnInverted from the Solr log file and paste the entries here, that would help to clarify things. Another explanation for the large amount of memory presents itself if you use a single index: If each of your clients facet on at least one fields specific to the client (client123_persons or something like that), then your memory usage goes through the roof. Assuming an index with 10M documents, each with 5 references to a modest 10K unique values in a facet field, the simplified formula #documents*log2(#references) + #references*log2(#unique_values) bit tells us that this takes at least 110MB with field cache based faceting. 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at least double that. This fits neatly with your new heap of 64GB. If my guessing is correct, you can solve your memory problems very easily by sharing _all_ the facet fields between your clients. This should bring your memory usage down to a few GB. You are probably already restricting their searches to their own data by filtering, so this should not influence the returned facet values and counts, as compared to separate fields. This is very similar to the thread Facets with 5000 facet fields BTW. Today I finally managed to set up a test core so I can begin to play around with docValues. If you are using a single index with the individual-facet-fields for each client approach, the DocValues will also have scaling issues, as the amount of values (of which the majority will be null) will be #clients*#documents*#facet_fields This means that the adding a new client will be progressively more expensive. On the other hand, if you use a lot of small shards, DocValues should work for you. Regards, Toke Eskildsen
Re: Some Questions About Using Solr as Cloud
Hi Jack; I see that SolrCloud makes everything automated. When I use SolrCloud is it true that: there may be more than one computer responsible for indexing at any time? 2013/4/15 Jack Krupansky j...@basetechnology.com There are no masters or slaves in SolrCloud - it's fully distributed. Some cluster nodes will be leaders (of the shard on that node) at a given point in time, but different nodes may be leaders at different points in time as they become elected. In a distributed cluster you would never want to store documents only on one node. Sure, you can do that by setting the replication factor to 1, but that defeats half the purpose for SolrCloud. Index transfer is automatic - SolrCloud supports fully distributed update. You might be getting confused with the old Master-Slave-Replication model that Solr had (and still has) which is distinct from SolrCloud. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Sunday, April 14, 2013 7:45 PM To: solr-user@lucene.apache.org Subject: Some Questions About Using Solr as Cloud I read wiki and reading SolrGuide of Lucidworks. However I want to clear something in my mind. Here are my questions: 1) Does SolrCloud lets a multi master design (is there any document that I can read about it)? 2) Let's assume that I use multiple cores i.e. core A and core B. Let's assume that there is a document just indexed at core B. If I send a search request to core A can I get result? 3) When I use multi master design (if exists) can I transfer one master's index data into another (with its slaves or not)? 4) When I use multi core design can I transfer one index data into another core or anywhere else? By the way thanks for the quick responses and kindness at mail list.
Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms
Hi, Does it work well, if you remove synonyms with spaces in them, like eighty six ? Dmitry On Fri, Apr 5, 2013 at 3:43 AM, juancesarvillalba juancesarvilla...@gmail.com wrote: Hi I saw some similar problems in other threads but I think that this is a little different and couldn't get any solution.*I get the exception */org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token eightysix exceeds length of provided text sized 80/This happens for example when I made a query for a word that have synonyms and with highlighting.For example I have made a query for 86 , I have a eightysix synonym for this and with highlighting I got the previous exception.The relevant conf is:*Field Type:* *Synonyms.txt*Brand 86, 86, eightysix, eight six, eighty six, eighty-six*Default Highlighting Component*100 700.5[-\w ,/\n\quot;apos;]{20,200} 10 .,!? #9;#10;#13;WORD en US Also I saw that we I removed some words from the synonyms list, it works right.Anyone has any idea about what is wrong ?Best Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I recover the position and offset a highlight for solr (4.1/4.2)?
Hi, They are available in the HighlighterComponent. You will need to read the source code. Dmitry On Wed, Mar 27, 2013 at 4:28 PM, Skealler Nametic bchaillou...@gmail.comwrote: Hi, I would like to retrieve the position and offset of each highlighting found. I searched on the internet, but I have not found the exact solution to my problem...
Re: Solr using a ridiculous amount of memory
I did a search. I have no occurrence of UnInverted in the solr logs. Another explanation for the large amount of memory presents itself if you use a single index: If each of your clients facet on at least one fields specific to the client (client123_persons or something like that), then your memory usage goes through the roof. This is exactly how we facet right now! I will definetely rewrite the relevant parts of our product to test this out before moving further down the docValues path. I will let you know as soon as I know one way or the other. On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote: The FieldCache is the big culprit. We do a huge amount of faceting so it seems right. Yes, you wrote that earlier. The mystery is that the math does not check out with the description you have given us. Unfortunately I am super swamped at work so I have precious little time to work on this, which is what explains my silence. No problem, we've all been there. [Band aid: More memory] The extra memory helped a lot, but it still OOM with about 180 clients using it. You stated earlier that you has a solr cluster and your total(?) index size was 35GB, with each register being between 15k and 30k. I am using the quotes to signify that it is unclear what you mean. Is your cluster multiple machines (I'm guessing no), multiple Solr's, cores, shards or maybe just a single instance prepared for later distribution? Is a register a core, shard or a simply logical part (one client's data) of the index? If each client has their own core or shard, that would mean that each client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180 ~= 200MB of index. That sounds quite high and you would need a very heavy facet to reach that. If you could grep UnInverted from the Solr log file and paste the entries here, that would help to clarify things. Another explanation for the large amount of memory presents itself if you use a single index: If each of your clients facet on at least one fields specific to the client (client123_persons or something like that), then your memory usage goes through the roof. Assuming an index with 10M documents, each with 5 references to a modest 10K unique values in a facet field, the simplified formula #documents*log2(#references) + #references*log2(#unique_values) bit tells us that this takes at least 110MB with field cache based faceting. 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at least double that. This fits neatly with your new heap of 64GB. If my guessing is correct, you can solve your memory problems very easily by sharing _all_ the facet fields between your clients. This should bring your memory usage down to a few GB. You are probably already restricting their searches to their own data by filtering, so this should not influence the returned facet values and counts, as compared to separate fields. This is very similar to the thread Facets with 5000 facet fields BTW. Today I finally managed to set up a test core so I can begin to play around with docValues. If you are using a single index with the individual-facet-fields for each client approach, the DocValues will also have scaling issues, as the amount of values (of which the majority will be null) will be #clients*#documents*#facet_fields This means that the adding a new client will be progressively more expensive. On the other hand, if you use a lot of small shards, DocValues should work for you. Regards, Toke Eskildsen -- Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk
SolrCloud Leaders
Does number of leaders at a SolrCloud is equal to number of shards?
Re: Solr using a ridiculous amount of memory
Might be obvious, but just in case - remember that you'll need to re-index your content once you've added docValues to your schema, in order to get the on-disk files to be created. Upayavira On Mon, Mar 25, 2013, at 03:16 PM, John Nielsen wrote: I apologize for the slow reply. Today has been killer. I will reply to everyone as soon as I get the time. I am having difficulties understanding how docValues work. Should I only add docValues to the fields that I actually use for sorting and faceting or on all fields? Will the docValues magic apply to the fields i activate docValues on or on the entire document when sorting/faceting on a field that has docValues activated? I'm not even sure which question to ask. I am struggling to understand this on a conceptual level. On Sun, Mar 24, 2013 at 7:11 PM, Robert Muir rcm...@gmail.com wrote: On Sun, Mar 24, 2013 at 4:19 AM, John Nielsen j...@mcb.dk wrote: Schema with DocValues attempt at solving problem: http://pastebin.com/Ne23NnW4 Config: http://pastebin.com/x1qykyXW This schema isn't using docvalues, due to a typo in your config. it should not be DocValues=true but docValues=true. Are you not getting an error? Solr needs to throw exception if you provide invalid attributes to the field. Nothing is more frustrating than having a typo or something in your configuration and solr just ignores this, reports no error, and doesnt work the way you want. I'll look into this (I already intend to add these checks to analysis factories for the same reason). Separately, if you really want the terms data and so on to remain on disk, it is not enough to just enable docvalues for the field. The default implementation uses the heap. So if you want that, you need to set docValuesFormat=Disk on the fieldtype. This will keep the majority of the data on disk, and only some key datastructures in heap memory. This might have significant performance impact depending upon what you are doing so you need to test that. -- Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk
Re: SolrCloud Leaders
It is supposed to be one leader per shard, yes. Upayavira On Mon, Apr 15, 2013, at 01:21 PM, Furkan KAMACI wrote: Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Re: SolrCloud Leaders
Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Number of unique terms in a field
Hi, in previous versions of Solr (at least with 1.4.1) the admin page displayed the number of unique terms in the index / in a field. I cannot find this on the new admin page anymore (Solr 4.0.0). Can somebody please give me a pointer or is this info not available anymore? Thank you, Andreas
Re: SolrCloud Leaders
Here writes something: https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and says: Both leaders and replicas index items and perform searches. How replicas index items? 2013/4/15 Furkan KAMACI furkankam...@gmail.com Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Re: Number of unique terms in a field
Andreas It's still there :) Open the UI, select a core, go to the Schema Browser, select the field from the drop down and click on the Load Term Info Button (right side, below properties analyzer). Then there's a [10] / 20315 Top-Terms row - right hand of the button you've actually clicked on :) HTH? Stefan On Monday, April 15, 2013 at 3:33 PM, Andreas Hubold wrote: Hi, in previous versions of Solr (at least with 1.4.1) the admin page displayed the number of unique terms in the index / in a field. I cannot find this on the new admin page anymore (Solr 4.0.0). Can somebody please give me a pointer or is this info not available anymore? Thank you, Andreas
RE: Getting page number of result with tika
Thanks a lot, I'm curious if anyone has this kind of need and tried that old patch to Solr 4+ and got it working. Gian Maria. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, April 13, 2013 3:40 PM To: solr-user@lucene.apache.org; Gian Maria Ricci Subject: Re: Getting page number of result with tika You can't assume that Fix Version/s 4.3 means anybody is actively working on it, and the age of the patches suggests nobody is. The Fix Version/s gets updated when releases are made, otherwise you'd have open JIRAs for, say, Solr 1.4.1. Near as I can tell, that JIRA is dead, don't look for it unless someone picks it up again. Best Erick On Thu, Apr 11, 2013 at 11:55 AM, Gian Maria Ricci alkamp...@nablasoft.com wrote: As far as I know SOLR-380 https://issues.apache.org/jira/browse/SOLR-380 deal with the problem of kowing page number with tika indexing. The issue contains a patch but it is really old, and I'm curious how is the status of this issue (since I see Fix Version/s 4.3, so it seems that it will be implemented in the next version). Anyone has a good workaround/patch/solution to search into tika indexed documents and having the list of pages where match was found? Thanks in advance. Gian Maria.
Re: Number of unique terms in a field
Hi Stefan, with Solr 4.0.0 I just get 10 / -1. I just tried it with Solr 4.2.1 and the example application and it seems to work there. Maybe this has been fixed/improved since 4.0.0. Thanks, Andreas Stefan Matheis wrote on 15.04.2013 15:49: Andreas It's still there :) Open the UI, select a core, go to the Schema Browser, select the field from the drop down and click on the Load Term Info Button (right side, below properties analyzer). Then there's a [10] / 20315 Top-Terms row - right hand of the button you've actually clicked on :) HTH? Stefan On Monday, April 15, 2013 at 3:33 PM, Andreas Hubold wrote: Hi, in previous versions of Solr (at least with 1.4.1) the admin page displayed the number of unique terms in the index / in a field. I cannot find this on the new admin page anymore (Solr 4.0.0). Can somebody please give me a pointer or is this info not available anymore? Thank you, Andreas
Usage of CloudSolrServer?
I am reading Lucidworks Solr Guide it says at SolrCloud section: *Read Side Fault Tolerance* With earlier versions of Solr, you had to set up your own load balancer. Now each individual node load balances requests across the replicas in a cluster. You still need a load balancer on the 'outside' that talks to the cluster, or you need a smart client. (Solr provides a smart Java Solrj client called CloudSolrServer.) My system is as follows: I crawl data with Nutch and send them into SolrCloud. Users will search at Solr. What is that CloudSolrServer, should I use it for load balancing or is it something else different?
Re: Tokenize on paragraphs and sentences
Technically, yes, but you would have to do a lot of work yourself. Like, a sentence/paragraph recognizer that inserted sentence and paragraph markers, and a query parser that allows you to do SpanNear and SpanNot (to selectively exclude sentence or paragraph marks based on your granularity of search.) The LucidWorks Search query parser has SpanNot support (or at least did at one point in time), but no sentence/paragraph marking. You could come up with some heuristic regular expressions for sentence and paragraph marks, like consecutive newlines for a paragraph and dot followed by white space for sentence (with some more heuristics for abbreviations.) Or you could have an update processor do the marking. -- Jack Krupansky -Original Message- From: Alex Cougarman Sent: Monday, April 15, 2013 9:48 AM To: solr-user@lucene.apache.org Subject: Tokenize on paragraphs and sentences Hi. Is it possible to search within paragraphs or sentences in Solr? The PatternTokenizerFactory uses regular expressions, but how can this be done with plain ASCII docs that don't have p tags (HTML), yet they're broken into paragraphs? Thanks. Warm regards, Alex
Re: SolrCloud Leaders
All nodes are replicas in SolrCloud since there are no masters. It's a fully distributed model. A leader is also a replica. A leader is simply a replica which was elected to be a leader, for now. An hour from now some other replica may be the leader. It is indeed misleading and inaccurate to suggest that leader and replicas are disjoint. Once again, I think you are confusing SolrCloud with the older Solr master/slave/replication. Every node in SolrCloud can do indexing. That's the same as saying that every replica in SolrCloud can do indexing. Although we do need to be clear that a given replica will only index documents for the shard(s) to which it belongs. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 9:38 AM To: solr-user@lucene.apache.org Subject: Re: SolrCloud Leaders Here writes something: https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and says: Both leaders and replicas index items and perform searches. How replicas index items? 2013/4/15 Furkan KAMACI furkankam...@gmail.com Does leaders may response search requests (I mean do they store indexes) at when I run SolrCloud at first and after a time later? 2013/4/15 Jack Krupansky j...@basetechnology.com When the cluster is fully operational, yes. But if part of the cluster is down or split and unable to communicate, or leader election is in progress, the actual count of leaders will not be indicative of the number of shards. Leaders and shards are apples and oranges. If you take down a cluster, by definition it would have no leaders (because leaders are running code), but shards are the files in the index on disk that continue to exist even if the code is not running. So, in the extreme, the number of leaders can be zero while the number of shards is non-zero on disk. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Monday, April 15, 2013 8:21 AM To: solr-user@lucene.apache.org Subject: SolrCloud Leaders Does number of leaders at a SolrCloud is equal to number of shards?
Dynamic data model design questions
I'm implementing a backend service that stores data in JSON format and I'd like to provide a search operation in the service. The data model is dynamic and will contain arbitrarily complex object graphs. How do I index object graphs with Solr? Does the data need to be flattened before indexing? Apparently the service needs to deliver new data and updates to Solr, but which one should be responsible for converting the data model to adhere to Solr schema? The service or Solr? Should the service deliver data to Solr in a form that adheres to Solr schema or should Solr be extended to digest data provided by the service? How does Solr handle dynamic data models? Solr seems to support dynamic data models with the dynamic fields feature in schemas. How are data types inferred when using dynamic fields? An alternative to using dynamic fields seems to be to change the schema when the data model changes. How easy is it to modify an existing schema? Do I need to reindex all the data? Can you do it online using an API? I'm planning on using Solr 4.2. marko
solr tdate field
Hi, I have date field being indexed into solr. in my schema i have the following code for it, field name=createdDate type=date indexed=true stored=true required=true / but in java, i get the following error when i search using solr: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Date Why is solr returning me String back where i have type=date in schema.xml? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr tdate field
Check your date field type to make sure it really is solr.DateField or solr.TrieDateField Then check whether you have a function query with an ms function that references a non-TrieDateField. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Monday, April 15, 2013 10:54 AM To: solr-user@lucene.apache.org Subject: solr tdate field Hi, I have date field being indexed into solr. in my schema i have the following code for it, field name=createdDate type=date indexed=true stored=true required=true / but in java, i get the following error when i search using solr: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Date Why is solr returning me String back where i have type=date in schema.xml? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrException parsing error
Hi All, I'm using Solr 4.1 and am receiving an org.apache.solr.common.SolrException parsing error with root cause java.io.EOFException (see below for stack trace). The query I'm performing is long/complex and I wonder if its size is causing the issue? I am querying via POST through SolrJ. The query (fq) itself is ~20,000 characters long in the form of: fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR + mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR + mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ... In short, I am querying for an ID throughout multiple dynamically created fields (mutation_prot_mt_#_#). Any thoughts on how to further debug? Thanks in advance, Luis -- SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw exception [Request processing failed; nested exception is org.apache.solr.common.SolrException: parsing error] with root cause java.io.EOFException at org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:107) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:387) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at x.x.x.x.x.x.someMethod(x.java:111) at x.x.x.x.x.x.otherMethod(x.java:222) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at x.x.x.x.x.yetAnotherMethod(x.java:333) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:146) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at
3 general questions about SolrCloud
Dear list, Sorry for these general questions and I'm really be mess now. 1. What's the model between the master and replicas in one shard? If the replicas are able to catch up with the master when the master receives a update request it will scatter the request to all the active replicas and expect responses before the request get executed by itself.This is called push model,right? When a new replica is present it will download the whole index from the master can this be called pull model?but when the master pushes updates to it how the replica behaves,continuing to download the whole index while keeping a track of the updates in a log(tlog) ? 2.What's the main use of the transaction log? Is it only used to serve the NRT get requests and not related with data sync between the master and replica? 3.Will the leader election be related with index version? If a shard has 3 replicas and when the master goes down how to choose the master, do they compare the lastest index version for each one or only consider the earliest presence time? RegardsThanks
Query Parser OR AND and NOT
Hallo, I do not really understand the query language of the SOLR-Queryparser. I use SOLR 4.2 und I have nearly 20 sample address records in the SOLR-Database. I only use the q field in the SOLR Admin Web GUI and every other controls on this website is on default. First category: zip:30* numFound=2896 city:H* OR zip:30* numFound=12519 city:H* AND zip:30* numFound=376 These results seems to me correct. Now I tried with negations: !city:H*numFound:194577(seems to be correct) !city:H* AND zip:30*numFound:2520(seems to be correct) !city:H* OR zip:30* numFound:2520(!! this is wrong !!) Or do I do not understand something? (!city:H*) OR zip:30*numFound: 2896 This is also wrong. Thanks for any hint to understand the negation handling of the query language. Ciao Peter Schütt
Re: Query Parser OR AND and NOT
should be: -city:H* OR zip:30* On Mon, Apr 15, 2013 at 12:03 PM, Peter Schütt newsgro...@pstt.de wrote: Hallo, I do not really understand the query language of the SOLR-Queryparser. I use SOLR 4.2 und I have nearly 20 sample address records in the SOLR-Database. I only use the q field in the SOLR Admin Web GUI and every other controls on this website is on default. First category: zip:30* numFound=2896 city:H* OR zip:30* numFound=12519 city:H* AND zip:30* numFound=376 These results seems to me correct. Now I tried with negations: !city:H*numFound:194577(seems to be correct) !city:H* AND zip:30*numFound:2520(seems to be correct) !city:H* OR zip:30* numFound:2520(!! this is wrong !!) Or do I do not understand something? (!city:H*) OR zip:30*numFound: 2896 This is also wrong. Thanks for any hint to understand the negation handling of the query language. Ciao Peter Schütt
Re: solr tdate field
fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ this is the date field in my schema.xml and i do not get the second point; how reference a non-TrieDateField. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069p4056088.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr tdate field
Show us the full query URL (at least all the parameters) and the defaults from the request handler in solrconfig. -- Jack Krupansky -Original Message- From: hassancrowdc Sent: Monday, April 15, 2013 12:17 PM To: solr-user@lucene.apache.org Subject: Re: solr tdate field fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ this is the date field in my schema.xml and i do not get the second point; how reference a non-TrieDateField. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069p4056088.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Parser OR AND and NOT
Hallo, Roman Chyla roman.ch...@gmail.com wrote in news:caen8dywjrl+e3b0hpc9ntlmjtrkasrqlvkzhkqxopmlhhfn...@mail.gmail.com: should be: -city:H* OR zip:30* -city:H* OR zip:30* numFound:2520 gives the same wrong result. Another Idea? Ciao Peter Schütt
Re: solr tdate field
query is as following: localhost:8080/solr/collection1/select?wt=jsonomitHeader=truedefType=dismaxrows=11qf=manufacturer%20model%20displayNamefl=idq=samsung and requesthandler: requestHandler name=standard class=solr.StandardRequestHandler default=true / requestHandler name=/update class=solr.XmlUpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / -- View this message in context: http://lucene.472066.n3.nabble.com/solr-tdate-field-tp4056069p4056100.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Parser OR AND and NOT
: Hallo, : I do not really understand the query language of the SOLR-Queryparser. http://www.lucidimagination.com/blog/2011/12/28/why-not-and-or-and-not/ The one comment i would add regarding your specific examples... : (!city:H*) OR zip:30*numFound: 2896 ...you can't have a boolean query -- the parens -- containing purely negative clauses like. that boolean query doesn't match anything, just just explcudes things. If the *entire* query is negative, then solr helps you out by implicitly making the negation relative to a query that matches all documents, but if you are creating boolean sub-queries with parens, then you need something positive in that sub-query to match some criteria X and then your negations provide exclusions from that criteria. -Hoss
Storing Solr Index on NFS
Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? Thanks, Saqib
Re: Query Parser OR AND and NOT
What if you try city:(*:* -H*) OR zip:30* Sometimes Solr requires a list of documents to subtract from (think of *:* -someQuery converts to all documents without someQuery). You can also try looking at your query with debugQuery = true. -Luis On Mon, Apr 15, 2013 at 12:25 PM, Peter Schütt newsgro...@pstt.de wrote: Hallo, Roman Chyla roman.ch...@gmail.com wrote in news:caen8dywjrl+e3b0hpc9ntlmjtrkasrqlvkzhkqxopmlhhfn...@mail.gmail.com: should be: -city:H* OR zip:30* -city:H* OR zip:30* numFound:2520 gives the same wrong result. Another Idea? Ciao Peter Schütt
Re: updateLog in Solr 4.2
On 4/12/2013 7:17 AM, vicky desai wrote: and solr fails to start . However if i add updatelog in my solrconfig.xml it starts. Is the update log parameter mandatory for solr4.2 You are using SolrCloud. SolrCloud requires both updateLog and replication to be enabled. As you probably know, updateLog requires the presence of a _version_ field, see the example schema for the full definition of that field. If you are using Solr without SolrCloud, these features are not required. The updateLog is recommended for all 4.x installs because NRTCachingDirectoryFactory is now the default. The way I understand it, with that Directory implementation, part of an update may not be persisted to disk even with a hard commit, so the updateLog is needed to ensure the data won't be lost. Thanks, Shawn
Re: Storing Solr Index on NFS
On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org
Re: Storing Solr Index on NFS
Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwood wun...@wunderwood.orgwrote: On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org
Re: Query Parser OR AND and NOT
Oh, sorry, I have assumed lucene query parser. I think SOLR qp must be different then, because for me it works as expected (our qp parser is identical with lucene in the way it treats modifiers +/- and operators AND/OR/NOT -- NOT must be joining two clauses: a NOT b, the first cannot be negative, as Chris points out; the modifier however can be first - but it cannot be alone, there must be at least one positive clause). Otherwise, -field:x it is changed into field:x http://labs.adsabs.harvard.edu/adsabs/search/?q=%28*+-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE http://labs.adsabs.harvard.edu/adsabs/search/?q=%28-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE roman On Mon, Apr 15, 2013 at 12:25 PM, Peter Schütt newsgro...@pstt.de wrote: Hallo, Roman Chyla roman.ch...@gmail.com wrote in news:caen8dywjrl+e3b0hpc9ntlmjtrkasrqlvkzhkqxopmlhhfn...@mail.gmail.com: should be: -city:H* OR zip:30* -city:H* OR zip:30* numFound:2520 gives the same wrong result. Another Idea? Ciao Peter Schütt
Re: SolrCloud vs Solr master-slave replication
On 4/15/2013 3:38 AM, Victor Ruiz wrote: About SolrCloud, I know it doesn't use master-slave replication, but incremental updates, item by item. That's why I thought it could work for us, since our bottleneck appear to be the replication cycles. But another point is, if the indexing occurs in all servers, 1200 updates/min could also overload the servers? and therefore have a worst performance than with master-slave replication? One version (4.1, I think) has a problem that results in the entire index being replicated every time. The I/O required for that makes everything slow down on both master and slave. There are reports of new master/slave replication problems with 4.2 and 4.2.1, but I'm not entirely clear on whether those are just cosmetic problems with index version reporting or whether some people are having actual real problems. In 3.x and older, replication was generally the best option for multiple copies of your index, because there was no NRT indexing capability. Updating the index was a resource-intensive process with a high impact on searching, loading a replicated index was better. Version 4.x adds NRT capabilities, so indexing impacts searches far less than it used to. SolrCloud with NRT features (frequent soft commits, less frequent hard commits) is the recommended configuration path now. Thanks, Shawn
Re: Storing Solr Index on NFS
Solr 4.2 does have field compression which makes smaller indexes. That will reduce the amount of network traffic. That probably does not help much, because I think the latency of NFS is what causes problems. wunder On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote: Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwood wun...@wunderwood.orgwrote: On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Grouping performance problem
Agnieszka, Did you find a good solution to your performance problem with grouping? I have an index with 45m records and am using grouping and the performance is atrocious. Any advice would be very welcome! Thanks in advance, David -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-performance-problem-tp3995245p4056113.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Usage of CloudSolrServer?
On 4/15/2013 8:05 AM, Furkan KAMACI wrote: My system is as follows: I crawl data with Nutch and send them into SolrCloud. Users will search at Solr. What is that CloudSolrServer, should I use it for load balancing or is it something else different? It appears that the Solr integration in Nutch currently does not use CloudSolrServer. There is an issue to add it. The mutual dependency on HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses HttpClient 4. https://issues.apache.org/jira/browse/NUTCH-1377 Until that is fixed, a load balancer would be required for full redundancy for updates with SolrCloud. You don't have to use a load balancer for it to work, but if the Solr server that Nutch is using goes down, then indexing will stop unless you reconfigure Nutch or bring the Solr server back up. Thanks, Shawn
Re: SolrException parsing error [Solved]
Sorry, spoke to soon. Turns out I was not sending the query via POST. Changing the method to POST solved the issue. Apologies for the spam! -Luis On Mon, Apr 15, 2013 at 11:47 AM, Luis Lebolo luis.leb...@gmail.com wrote: Hi All, I'm using Solr 4.1 and am receiving an org.apache.solr.common.SolrException parsing error with root cause java.io.EOFException (see below for stack trace). The query I'm performing is long/complex and I wonder if its size is causing the issue? I am querying via POST through SolrJ. The query (fq) itself is ~20,000 characters long in the form of: fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR + mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR + mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ... In short, I am querying for an ID throughout multiple dynamically created fields (mutation_prot_mt_#_#). Any thoughts on how to further debug? Thanks in advance, Luis -- SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw exception [Request processing failed; nested exception is org.apache.solr.common.SolrException: parsing error] with root cause java.io.EOFException at org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:107) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:387) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) at x.x.x.x.x.x.someMethod(x.java:111) at x.x.x.x.x.x.otherMethod(x.java:222) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778) at javax.servlet.http.HttpServlet.service(HttpServlet.java:621) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330) at x.x.x.x.x.yetAnotherMethod(x.java:333) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:118) at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:84) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113) at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:342) at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:146) at
Re: Dynamic data model design questions
On 4/15/2013 8:40 AM, Marko Asplund wrote: I'm implementing a backend service that stores data in JSON format and I'd like to provide a search operation in the service. The data model is dynamic and will contain arbitrarily complex object graphs. How do I index object graphs with Solr? Does the data need to be flattened before indexing? Solr does have some *very* limited capability for doing joins between indexes, but generally speaking, you need to flatten the data. Apparently the service needs to deliver new data and updates to Solr, but which one should be responsible for converting the data model to adhere to Solr schema? The service or Solr? Should the service deliver data to Solr in a form that adheres to Solr schema or should Solr be extended to digest data provided by the service? Solr's ability to change your data after receiving it is fairly limited. The schema has some ability in this regard for indexed values, but the stored data is 100% verbatim as Solr receives it. If you will be using the dataimport handler, it does have some transform capability before sending to Solr. Most of the time, the rule of thumb is that changing the data on the Solr side will require contrib/custom plugins, so it may be easier to do it before Solr receives it. How does Solr handle dynamic data models? Solr seems to support dynamic data models with the dynamic fields feature in schemas. How are data types inferred when using dynamic fields? A wildcard field name is used, like i_* or *_int and that definition includes the data type. An alternative to using dynamic fields seems to be to change the schema when the data model changes. How easy is it to modify an existing schema? Do I need to reindex all the data? Can you do it online using an API? Changing the schema is as simple as modifying schema.xml and reloading the core or restarting Solr. An API for online schema changes is coming, I don't know if it will be ready in time for 4.3 or if it will get pushed back to 4.4. No matter how you make the change, the following applies: If you add fields, reindexing is not necessary, but existing documents will not have the new fields until you do. If you change the query analyzer chain, no reindex is required. If you change the index analyzer chain or options that affect indexing, reindexing IS required. Thanks, Shawn
Re: SolrException parsing error
On 4/15/2013 9:47 AM, Luis Lebolo wrote: Hi All, I'm using Solr 4.1 and am receiving an org.apache.solr.common.SolrException parsing error with root cause java.io.EOFException (see below for stack trace). The query I'm performing is long/complex and I wonder if its size is causing the issue? I am querying via POST through SolrJ. The query (fq) itself is ~20,000 characters long in the form of: fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR + mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR + mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ... In short, I am querying for an ID throughout multiple dynamically created fields (mutation_prot_mt_#_#). Any thoughts on how to further debug? Thanks in advance, Luis -- SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw exception [Request processing failed; nested exception is org.apache.solr.common.SolrException: parsing error] with root cause java.io.EOFException at org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:107) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:387) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) I am guessing that this log is coming from your SolrJ client, but That is not completely clear, so is it SolrJ or Solr that is logging this error? If it's SolrJ, do you see anything in the Solr log, and vice versa? This looks to me like a network problem, where something is dropping the connection before transfer is complete. It could be an unusual server-side config, OS problems, timeout settings in the SolrJ code, NIC drivers/firmware, bad cables, bad network hardware, etc. Thanks, Shawn
Re: 3 general questions about SolrCloud
On 4/15/2013 9:58 AM, SuoNayi wrote: 1. What's the model between the master and replicas in one shard? If the replicas are able to catch up with the master when the master receives a update request it will scatter the request to all the active replicas and expect responses before the request get executed by itself.This is called push model,right? When a new replica is present it will download the whole index from the master can this be called pull model?but when the master pushes updates to it how the replica behaves,continuing to download the whole index while keeping a track of the updates in a log(tlog) ? There is no master. SolrCloud is fully distributed. One replica on each shard is elected leader, but that is not a permanent designation. 2.What's the main use of the transaction log? Is it only used to serve the NRT get requests and not related with data sync between the master and replica? The transaction log is used to replay transactions when a node starts up. If the differences between the leader replica and a replica that just started are small enough, the transaction log will be used to bring them back into sync. If they are too different, the one that just started will replicate the full index from the leader. I am pretty sure that the _version_ field present on every document is used to determine whether replicas are in sync, not the index version. 3.Will the leader election be related with index version? If a shard has 3 replicas and when the master goes down how to choose the master, do they compare the lastest index version for each one or only consider the earliest presence time? Here is my understanding about leader elections, I hope it's right! Leader elections only take place when the leader goes down. Once a replica is elected leader, it will remain leader unless it goes down. The other replicas can go up and down and the leader will retain that role. I do not think the index version is used at all, the _version_ field in the index is probably used instead. Thanks, Shawn
Re: Spellchecker not working for Solr 4.1
I am using spellcheck=true when i post the search. ex. solr/productindex/productQuery?q=fuacetspellcheck=true -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056131.html Sent from the Solr - User mailing list archive at Nabble.com.
Trigger documents update in a collection
Hi all, I want to use Solr4 as a NoSQL. My 'ideal' workflow is to add/update documents in a collection (NoSQL) and automatically update changes in another collection with more specific search capabilities. The nosql collection will contains all my documents (750M docs). The 'searchable' collection will only contain a subset of this collection (active documents based on a field). Is it possible ? Thank you
Document adds, deletes, and commits ... a question about visibility.
Simple question first: Is there anything in SolrJ that prevents indexing more than 500 documents in one request? I'm not aware of anything myself, but a co-worker remembers running into something, so his code is restricting them to 490 docs. The only related limit I'm aware of is the POST buffer size limit, which defaults in recent Solr versions to 2MiB. A more complex question: If I am doing both deletes and adds in separate update requests, and I want to ensure that a delete in the next request can delete a document that I am adding in the current one, do I need to commit between the two requests? This is probably more of a Lucene question than Solr, but Solr is what I'm using. To simplify: Let's say I start with an empty index. I add documents a and b in one request ... then I send a deleteByQuery request for a c and e. If I don't do a commit between these two requests, will a still be in the index when I commit after the second request? If so, would there be an easy fix? Thanks, Shawn
Re: Document adds, deletes, and commits ... a question about visibility.
At the Lucene level, you don't have to commit before doing the deleteByQuery, i.e. 'a' will be correctly deleted without any intervening commit. Mike McCandless http://blog.mikemccandless.com On Mon, Apr 15, 2013 at 3:57 PM, Shawn Heisey s...@elyograg.org wrote: Simple question first: Is there anything in SolrJ that prevents indexing more than 500 documents in one request? I'm not aware of anything myself, but a co-worker remembers running into something, so his code is restricting them to 490 docs. The only related limit I'm aware of is the POST buffer size limit, which defaults in recent Solr versions to 2MiB. A more complex question: If I am doing both deletes and adds in separate update requests, and I want to ensure that a delete in the next request can delete a document that I am adding in the current one, do I need to commit between the two requests? This is probably more of a Lucene question than Solr, but Solr is what I'm using. To simplify: Let's say I start with an empty index. I add documents a and b in one request ... then I send a deleteByQuery request for a c and e. If I don't do a commit between these two requests, will a still be in the index when I commit after the second request? If so, would there be an easy fix? Thanks, Shawn
Re: Trigger documents update in a collection
Hi, Doable with a custom Update Request Processor, yes. Otis Solr ElasticSearch Support http://sematext.com/ On Apr 15, 2013 3:14 PM, Francois Perron francois.per...@wantedanalytics.com wrote: Hi all, I want to use Solr4 as a NoSQL. My 'ideal' workflow is to add/update documents in a collection (NoSQL) and automatically update changes in another collection with more specific search capabilities. The nosql collection will contains all my documents (750M docs). The 'searchable' collection will only contain a subset of this collection (active documents based on a field). Is it possible ? Thank you
Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms
Hi, Before I had a different configuration that was working but with Synonyms in Query time. Now I have a requirement to add multi-word synonyms is for that I am checking this configuration. It doesn't work with this configuration still without multi-words synonyms. The problem happens only with Highlighting ON. -- View this message in context: http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056186.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing Solr Index on NFS
If centralization of storage is your goal by choosing NFS, iSCSI works reasonably well with SOLR indexes, although good local-storage will always be the overall winner. I noticed a near 5% degredation in overall search performance (casual testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup. Tim On 15/04/13 09:59 AM, Walter Underwood wrote: Solr 4.2 does have field compression which makes smaller indexes. That will reduce the amount of network traffic. That probably does not help much, because I think the latency of NFS is what causes problems. wunder On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote: Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwun...@wunderwood.orgwrote: On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: using maven to deploy solr on tomcat
On 4/15/2013 2:33 PM, Adeel Qureshi wrote: Environment name=solr/home override=true type=java.lang.String value=src/main/resources/solr-dev/ but this leads to absolute path of INFO: Using JNDI solr.home: src/main/resources/solr-dev INFO: looking for solr.xml: C:\springsource\sts-2.8.1.RELEASE\src\main\resources\solr-dev\solr.xml If you use a relative path for the solr home as you have done, it will be relative to the current working directory. The CWD can vary depending on how tomcat gets started. In your case, the CWD seems to be C:\springsource\sts-2.8.1.RELEASE. If you change the CWD in the tomcat startup, you will probably have to set the TOMCAT_HOME environment variable for tomcat to start correctly, so I don't recommend doing that. It is usually best to choose an absolute path for the solr home. Solr will find solr.xml there, which it will use to find the rest of your config(s). All paths in solr.xml and other solr config files can be relative. What you are seeing as an absolute path is likely the current working directory plus your solr home setting. Thanks, Shawn
Re:Re: 3 general questions about SolrCloud
Thanks for clarification and I think I did make it clear. At 2013-04-16 01:59:59,Shawn Heisey s...@elyograg.org wrote: On 4/15/2013 9:58 AM, SuoNayi wrote: 1. What's the model between the master and replicas in one shard? If the replicas are able to catch up with the master when the master receives a update request it will scatter the request to all the active replicas and expect responses before the request get executed by itself.This is called push model,right? When a new replica is present it will download the whole index from the master can this be called pull model?but when the master pushes updates to it how the replica behaves,continuing to download the whole index while keeping a track of the updates in a log(tlog) ? There is no master. SolrCloud is fully distributed. One replica on each shard is elected leader, but that is not a permanent designation. 2.What's the main use of the transaction log? Is it only used to serve the NRT get requests and not related with data sync between the master and replica? The transaction log is used to replay transactions when a node starts up. If the differences between the leader replica and a replica that just started are small enough, the transaction log will be used to bring them back into sync. If they are too different, the one that just started will replicate the full index from the leader. I am pretty sure that the _version_ field present on every document is used to determine whether replicas are in sync, not the index version. 3.Will the leader election be related with index version? If a shard has 3 replicas and when the master goes down how to choose the master, do they compare the lastest index version for each one or only consider the earliest presence time? Here is my understanding about leader elections, I hope it's right! Leader elections only take place when the leader goes down. Once a replica is elected leader, it will remain leader unless it goes down. The other replicas can go up and down and the leader will retain that role. I do not think the index version is used at all, the _version_ field in the index is probably used instead. Thanks, Shawn
Push/pull model between leader and replica in one shard
Hi, can someone explain more details about what model is used to sync docs between the lead and replica in the shard? The model can be push or pull.Supposing I have only one shard that has 1 leader and 2 replicas, when the leader receives a update request, does it will scatter the request to each available and active replica at first and then processes the request locally at last?In this case if the replicas are able to catch up with the leader can I think this is a push model that the leader pushes updates to it's replicas? What happens if a replica is behind the leader?Will the replica pull docs from the leader and keep a track of the coming updates from the lead in a log(called tlog)?If so when it complete pulling docs it will replay updates in the tlog at last? regards
Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms
Do you use the standard highlighter or FastVectorHighlighter / PhraseHighlighter ? Do you use hl.highlightMultiTermhttp://wiki.apache.org/solr/HighlightingParameters#hl.highlightMultiTerm option? On Tue, Apr 16, 2013 at 2:51 AM, juancesarvillalba juancesarvilla...@gmail.com wrote: Hi, Before I had a different configuration that was working but with Synonyms in Query time. Now I have a requirement to add multi-word synonyms is for that I am checking this configuration. It doesn't work with this configuration still without multi-words synonyms. The problem happens only with Highlighting ON. -- View this message in context: http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056186.html Sent from the Solr - User mailing list archive at Nabble.com.