replication problems with solr4.1
Hi list, after upgrading from solr4.0 to solr4.1 and running it for two weeks now it turns out that replication has problems and unpredictable results. My installation is single index 41 mio. docs / 115 GB index size / 1 master / 3 slaves. - the master builds a new index from scratch once a week - a replication is started manually with Solr admin GUI What I see is one of these cases: - after a replication a new searcher is opened on index.xxx directory and the old data/index/ directory is never deleted and besides the file replication.properties there is also a file index.properties OR - the replication takes place everything looks fine but when opening the admin GUI the statistics report Last Modified: a day ago Num Docs: 42262349 Max Doc: 42262349 Deleted Docs: 0 Version: 45174 Segment Count: 1 VersionGen Size Master: 1360483635404 112 116.5 GB Slave: 1360483806741 113 116.5 GB In the first case, why is the replication doing that??? It is an offline slave, no search activity, just there fore backup! In the second case, why is the version and generation different right after full replication? Any thoughts on this? - Bernd
Faceting on tree structure in SOLR4
Hello, I have a tree data structure like t1 |-t2 |-t3 t4 |-t5 and so on . And there is no limit on tree depth as well as number of children to each node. What I want is that when I do the faceting for parent node t1 it should also include count of all of its children (t2 and t3 in this case). So lets say count corresponding to t1 is 5 and t2 and t3 also its 5 then the total should display 15 as a count against t1. Please let me know how I can achieve this. I am using SOLR4 and tree structure is dynamic and subject to addition,deletion and edition. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-tree-structure-in-SOLR4-tp4039650.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting on tree structure in SOLR4
Hello, is http://wiki.apache.org/solr/HierarchicalFaceting what are you talking about? On Mon, Feb 11, 2013 at 12:42 PM, Alok Bhandari alokomprakashbhand...@gmail.com wrote: Hello, I have a tree data structure like t1 |-t2 |-t3 t4 |-t5 and so on . And there is no limit on tree depth as well as number of children to each node. What I want is that when I do the faceting for parent node t1 it should also include count of all of its children (t2 and t3 in this case). So lets say count corresponding to t1 is 5 and t2 and t3 also its 5 then the total should display 15 as a count against t1. Please let me know how I can achieve this. I am using SOLR4 and tree structure is dynamic and subject to addition,deletion and edition. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Faceting-on-tree-structure-in-SOLR4-tp4039650.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Crawl Anywhere -
Have a look at Nutch2, it is decoupled from HDFS and can store docs in e.g. HBase or other NoSql store. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 11. feb. 2013 kl. 06:16 skrev SivaKarthik sivakarthik.kpa...@gmail.com: Dear Erick, Thanks for ur relpy.. ya..nutch can meet my requirement... but the problem is, i want to store the crawled document in html or xml format instead of mapreduce format.. not sure nutch plugins available to convert into xml files. please share me if you any idea . ThankYou -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4039619.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Crawl Anywhere -
Yes you can run CA on different machines. In Manage you have to set target and engine for this to work. I've never done this, so you have to contact the developer for more details. SivaKarthik wrote Hi All, in our project, we need to download around millions of pages... so is there any support to do the crawling in distributed environment using crawl-anywhere apps? or wat could be the alternatives...? Thanks in advance.. -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4039674.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Solrj 4.0] How use JOIN
Hi, thanks for advice. But I need to use parent_condition and child_condition in same time. Parent condition is: (name:Thomas AND age:40) Child condition: (name:John AND age:17) join from=parent to=id So something like: (name:Thomas AND age:40) AND {!join from=parent to=id}(name:John AND age:17) This all within Solrj 4.0 (or 4.1) I think there is solution using nested query like this: (name:Thomas AND age:40) AND _query_:{!join from=parent to=id}(name:John AND age:17) but I don't like this syntax, so looking for something else. Any idea? -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-4-0-How-use-JOIN-tp4024262p4039675.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help! How to remove shards from SolrCloud. They keep come back...
Hi Mark, Thanks for you response. I did delete the data directory, but that didn't help. However, upon checking my zookeeper installation I found a clusterstate.json item that contained references to core data directories that didn't exist anymore. I wiped this item and it seems to work fine now. Thanks for your help! Rene On Sat, Feb 9, 2013 at 8:47 PM, Mark Miller markrmil...@gmail.com wrote: Did you clear the data dir for all 3 zk's? If not, you will find ghosts coming back to haunt you :) It's often easier to clear zk programmatically - for example it's one call from the cmd line zkcli script. http://wiki.apache.org/solr/SolrCloud#Command_Line_Util - Mark On Feb 9, 2013, at 1:19 PM, Rene Nederhand r...@nederhand.net wrote: Hi, I am experimenting with SolrCloud (v. 4.1) and everything seems to work fine. Now I would like to restart with a clean environment, but I cannot get rid of all the collections, shards and cores I have created. What I did: - Closed down all Zookeeper servers (I have an ensemble of 3) and Solr servers (also 3) - I have deleted the collections and configs from zookeeper; - I deleted the data directory (version-2) from zookeeper - I deleted my solr home (with all data files) - Edited Solr.xml so there is no reference to instances anymore. When I restart, I get an error about no existing SolrCores, but after adding a new config, collection and one SolrCore I see a graph of all previous existing shards/cores. How can I go back to a clean state? How to remove these collections/shards? Thanks for helping. Rene
Re: Maximum Number of Records In Index
Otis, Do you run 4bn docs SolrCloud or ElasticSearch or aware of somebody who do? 10.02.2013 4:54 пользователь Otis Gospodnetic otis.gospodne...@gmail.com написал: Exceeding 2B is no problem. But it won't happen in a single Lucene index any time soon, so... Otis Solr ElasticSearch Support http://sematext.com/ On Feb 7, 2013 10:08 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Actually, I have a dream to exceed those two billions. It seems possible, to move to Vint in fileformat and change int docnums to longs in Lucene API. Does anyone know whether it's possible? And this question is not so esoteric if we are talking about SolrCloud, which can hold more that 2bn docs in few smaller shards. Any experience? On Thu, Feb 7, 2013 at 5:46 PM, Rafał Kuć r@solr.pl wrote: Hello! Right, my bad - ids are still using int32. However, that still gives us 2,147,483,648 possible identifiers right per single index, which is not close to the 13,5 millions mentioned in the first mail. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Rafal, What about docnums, don't they are limited by int32 ? 07.02.2013 15:33 пользователь Rafał Kuć r@solr.pl написал: Hello! Practically there is no limit in how many documents can be stored in a single index. In your case, as you are using Solr from 2011, there is a limitation regarding the number of unique terms per Lucene segment ( http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/fileformats.html#Limitations ). However I don't think you've hit that. Solr by itself doesn't remove documents unless told to do so. Its hard to guess what can be the reason and as you said, you see updates coming to your handler. Maybe new documents have the same identifiers that the ones that are already indexed ? As I said, this is only a guess and we would need to have more information. Are there any exceptions in the logs ? Do you run delete command ? Are your index files changed ? How do you run commit ? -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch I have searched this forum but not yet found a definitive answer, I think the answer is There is No Limit depends on server specification. But never the less I will say what I have seen and then ask the questions. From scratch (November 2011) I have set up our SOLR which contains data from various sources, since March 2012 , the number of indexed records (unique ID's) reached 13.5 million , which was to be expected. However for the last 8 months the number of records in the index has not gone above 13.5 million, yet looking at the request handler outputs I can safely say at least anywhere from 50 thousand to 100 thousand records are being indexed daily. So I am assuming that earlier records are being removed, and I do not want that. Question: If there is a limit to the number of records the index can store where do I find this and change it? Question: If there is no limit does anyone have any idea why for the last months the number has not gone beyond 13.5 million, I can safely say that at least 90% are new records. thanks macroman -- View this message in context: http://lucene.472066.n3.nabble.com/Maximum-Number-of-Records-In-Index-tp4038961.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Problems using distributed More Like This
SOLR-788 added Distributed MLT to Solr 4.1, but I have not been able to get it to work. I don't know if it's user error, which of course is very possible. If it is user error, I'd like to know what I'm doing wrong so I can fix it. I am actually using a recent checkout of Solr 4.2, not the released 4.1. I put some extensive information on SOLR-4414, an issue filed by another user having a similar problem. If you look for the last comment from me on Feb 7 that has a code block, you'll see Solr's response when I use MoreLikeThisComponent. https://issues.apache.org/jira/browse/SOLR-4414 Only the last seven of the query parameters were included on the URL - the rest of them are in solrconfig.xml. Due to echoParams=all, the only part of the request handler definition that you can't see in the response is the fact that last-components contains spellcheck. I redacted the company domain name from the shards and the one document matching the query from the result tag, but there are no other changes to the response. If I send an identical query to the shard core that actually contains the document rather than the core with the shards parameter, I get MLT results. I have heard recently that Solr 4.x has hardcoded the unique field name for SolrCloud sharding as id ... but my uniqueKey field name is tag_id. Could this be my problem? It would be a monumental development effort to change that field name in our application. I am not using SolrCloud for this index. Thanks, Shawn
RE: Solr query parser, needs to call setAutoGeneratePhraseQueries(true)
Thanks very much, it worked perfectly !! Best regards, Lisheng -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, February 08, 2013 1:04 PM To: solr-user@lucene.apache.org Subject: Re: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) (Sorry for my split message)... See the text_en_splitting field type for an example: fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true ... -- Jack Krupansky -Original Message- From: Zhang, Lisheng Sent: Friday, February 08, 2013 3:20 PM To: solr-user@lucene.apache.org Subject: Solr query parser, needs to call setAutoGeneratePhraseQueries(true) Hi, In our application we need to call method setAutoGeneratePhraseQueries(true) on lucene QueryParser, this is the way used to work in earlier versions and it seems to me that is the much natural way? But in current solr 3.6.1, the only way to do so is to set luceneMatchVersionLUCENE_30/luceneMatchVersion in solrconfig.xml (if I read souce code correctly), but I donot want to do so because this will change the whole behavior of lucene, and I only want to change this query parser behavior, not other lucene features? Please guide me if there is a better way other than to change solr source code? Thanks very much for helps, Lisheng
Re: Can Solr analyze content and find dates and places
Hi Sujit and others who answered my question, I have been working on the UIMA path which seems great with the available Eclipse tooling and this: http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html Now I worked through the UIMA tutorial of the RoomNumberAnnotator: http://uima.apache.org/doc-uima-annotator.html And I am able to test it using the UIMA CAS Virtuall Debugger. So far so good. But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot find the xml file and the Java class (they are in the correct lib directories, because the WhitespaceTokenizer works fine). updateRequestProcessorChain name=uima processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters /lst str name=analysisEngine/RoomNumberAnnotator.xml/str bool name=ignoreErrorsfalse/bool lst name=analyzeFields bool name=mergefalse/bool arr name=fields strcontent/str /arr /lst lst name=fieldMappings lst name=type str name=nameorg.apache.uima.tutorial.RoomNumber/str lst name=mapping str name=featurebuilding/str str name=fieldUIMAname/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it fails: Deploy new jars inside one of the lib directories Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima path. Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch can I checkout? This is the Stable release I am running: Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36 Regards, Bart On 8 Feb 2013, at 22:11, SUJIT PAL wrote: Hi Bart, I did some work with UIMA but this was to annotate the data before it goes to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you will have to set up your own aggregate analysis chain in place of the one currently configured. Writing UIMA annotators is very simple (there is a tutorial here: [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]). You provide the XML description for the annotation and let UIMA generate the annotation bean. You write Java code for the annotator and also the annotator XML descriptor. UIMA uses the annotator XML descriptor to instantiate and run your annotator. Overall, sounds really complicated but its actually quite simple. The tutorial has quite a few examples that you will find useful, but in case you need more, I have some on this github repository: [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima] The dictionary and pattern annotators may be similar to what you are looking for (date and city annotators). Best regards, Sujit On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote: Hi Alex, Indeed that is exactly what I am trying to achieve using wordcities. Date will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do I integrate the Java library as UIMA? The documentation about changing schema.xml and solr.xml is not very detailed. Regards, Bart On 8 Feb 2013, at 16:57, Alexandre Rafalovitch arafa...@gmail.com wrote: Hi Bart, I haven't done any UIMA work (I used other stuff for my NLP phase), so not sure I can help much further. But in general, you are venturing into pure research territory here. Even for dates, what do you actually mean? Just fixed expression? Relative dates (e.g. last tuesday?). What about times (7pm?). Same with cities. If you want it offline, you need the gazetteer and disambiguation modules. Gazetteer for cities (worldwide) is huge and has a lot of duplicate names (Paris, Ontario is apparently a short drive from London, Ontario eh?). Something like http://www.maxmind.com/en/worldcities? And disambiguation usually requires training corpus that is similar to what your text will look like. Online services like OpenCalais are backed by gigantic databases and some serious corpus-training Machine Language disambiguation algorithms. So, no plug-and-play solution here. If you really need to get this done, I would recommend narrowing down the specification of exactly what you will settle for and looking for software that can do it. Once you have that, integration with Solr is your next - and smaller - concern. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature
Re: Do I have to reindex when upgrading from solr 4.0 to 4.1?
Arkadi, That's the answer I received at Solr Bootcamp, yes. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Feb 11, 2013 at 2:23 AM, Arkadi Colson ark...@smartbit.be wrote: Does it mean that when you redo indexing after the upgrade to 4.1 shard splitting will work in 4.2? Met vriendelijke groeten Arkadi Colson Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen T +32 11 64 08 80 • F +32 11 64 08 81 On 02/10/2013 05:21 PM, Michael Della Bitta wrote: No. You can just update Solr in place. But... If you're using Solr Cloud, your documents won't be hashed in a way that lets you do shard splitting in 4.2. That seemed to be the consensus during Solr Boot Camp. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Sun, Feb 10, 2013 at 10:46 AM, adfel70 adfe...@gmail.com wrote: Do I have to recreate the collections/cores? Do I have to reindex? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Do-I-have-to-reindex-when-upgrading-from-solr-4-0-to-4-1-tp4039560.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Maximum Number of Records In Index
We don't run one ourselves at Sematext, but know of people who do have large ES clusters, one with 10B docs. Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Feb 11, 2013 at 8:41 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Otis, Do you run 4bn docs SolrCloud or ElasticSearch or aware of somebody who do? 10.02.2013 4:54 пользователь Otis Gospodnetic otis.gospodne...@gmail.com написал: Exceeding 2B is no problem. But it won't happen in a single Lucene index any time soon, so... Otis Solr ElasticSearch Support http://sematext.com/ On Feb 7, 2013 10:08 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Actually, I have a dream to exceed those two billions. It seems possible, to move to Vint in fileformat and change int docnums to longs in Lucene API. Does anyone know whether it's possible? And this question is not so esoteric if we are talking about SolrCloud, which can hold more that 2bn docs in few smaller shards. Any experience? On Thu, Feb 7, 2013 at 5:46 PM, Rafał Kuć r@solr.pl wrote: Hello! Right, my bad - ids are still using int32. However, that still gives us 2,147,483,648 possible identifiers right per single index, which is not close to the 13,5 millions mentioned in the first mail. -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Rafal, What about docnums, don't they are limited by int32 ? 07.02.2013 15:33 пользователь Rafał Kuć r@solr.pl написал: Hello! Practically there is no limit in how many documents can be stored in a single index. In your case, as you are using Solr from 2011, there is a limitation regarding the number of unique terms per Lucene segment ( http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/fileformats.html#Limitations ). However I don't think you've hit that. Solr by itself doesn't remove documents unless told to do so. Its hard to guess what can be the reason and as you said, you see updates coming to your handler. Maybe new documents have the same identifiers that the ones that are already indexed ? As I said, this is only a guess and we would need to have more information. Are there any exceptions in the logs ? Do you run delete command ? Are your index files changed ? How do you run commit ? -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch I have searched this forum but not yet found a definitive answer, I think the answer is There is No Limit depends on server specification. But never the less I will say what I have seen and then ask the questions. From scratch (November 2011) I have set up our SOLR which contains data from various sources, since March 2012 , the number of indexed records (unique ID's) reached 13.5 million , which was to be expected. However for the last 8 months the number of records in the index has not gone above 13.5 million, yet looking at the request handler outputs I can safely say at least anywhere from 50 thousand to 100 thousand records are being indexed daily. So I am assuming that earlier records are being removed, and I do not want that. Question: If there is a limit to the number of records the index can store where do I find this and change it? Question: If there is no limit does anyone have any idea why for the last months the number has not gone beyond 13.5 million, I can safely say that at least 90% are new records. thanks macroman -- View this message in context: http://lucene.472066.n3.nabble.com/Maximum-Number-of-Records-In-Index-tp4038961.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
SolrCloud upgrade from 4.0 to 4.1
I'm trying to help someone in #solr on IRC. Early in the 4.1 release vote process over on the dev@l.a.o mailing list, Mark Miller mentioned that ugprading SolrCloud from 4.0 to 4.1 may not be as straightforward as the usual Solr upgrade process. Providing some detailed instructions was mentioned, but I cannot find any such thing. Is there a documented procedure somewhere, or is it as simple as dropping in the new .war, massaging the config, and restarting? Thanks, Shawn
Re: SolrCloud upgrade from 4.0 to 4.1
Yonik looked into it and said the process was actually fine in his testing. After the release, we did find one issue - if you don't explicitly set the host, the host 'guess' feature has changed and may guess a different address. - Mark On Feb 11, 2013, at 1:16 PM, Shawn Heisey s...@elyograg.org wrote: I'm trying to help someone in #solr on IRC. Early in the 4.1 release vote process over on the dev@l.a.o mailing list, Mark Miller mentioned that ugprading SolrCloud from 4.0 to 4.1 may not be as straightforward as the usual Solr upgrade process. Providing some detailed instructions was mentioned, but I cannot find any such thing. Is there a documented procedure somewhere, or is it as simple as dropping in the new .war, massaging the config, and restarting? Thanks, Shawn
Re: SolrCloud upgrade from 4.0 to 4.1
Hey does there exist a upgrade guide? Or do you simply copy all files over? If yes, how to verify if everything is in place. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-upgrade-from-4-0-to-4-1-tp4039757p4039775.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can Solr analyze content and find dates and places
Hi Bart, Like I said, I didn't actually hook my UIMA stuff into Solr, content and queries are annotated before they reach Solr. What you describe sounds like a classpath problem (but of course you already knew that :-)). Since I haven't actually done what you are trying to do, here are some suggestions, they may or may not work... 1) package up the XML files into your custom JAR at the top level, that way you don't need to specify it as /RoomNumberAnnotator.xml. 2) if you are using solr4, then you should drop your custom JAR into $SOLR_HOME/collection1/lib, not $SOLR_HOME/lib. -sujit On Feb 11, 2013, at 9:40 AM, jazz wrote: Hi Sujit and others who answered my question, I have been working on the UIMA path which seems great with the available Eclipse tooling and this: http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html Now I worked through the UIMA tutorial of the RoomNumberAnnotator: http://uima.apache.org/doc-uima-annotator.html And I am able to test it using the UIMA CAS Virtuall Debugger. So far so good. But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot find the xml file and the Java class (they are in the correct lib directories, because the WhitespaceTokenizer works fine). updateRequestProcessorChain name=uima processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters /lst str name=analysisEngine/RoomNumberAnnotator.xml/str bool name=ignoreErrorsfalse/bool lst name=analyzeFields bool name=mergefalse/bool arr name=fields strcontent/str /arr /lst lst name=fieldMappings lst name=type str name=nameorg.apache.uima.tutorial.RoomNumber/str lst name=mapping str name=featurebuilding/str str name=fieldUIMAname/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it fails: Deploy new jars inside one of the lib directories Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima path. Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch can I checkout? This is the Stable release I am running: Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36 Regards, Bart On 8 Feb 2013, at 22:11, SUJIT PAL wrote: Hi Bart, I did some work with UIMA but this was to annotate the data before it goes to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you will have to set up your own aggregate analysis chain in place of the one currently configured. Writing UIMA annotators is very simple (there is a tutorial here: [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]). You provide the XML description for the annotation and let UIMA generate the annotation bean. You write Java code for the annotator and also the annotator XML descriptor. UIMA uses the annotator XML descriptor to instantiate and run your annotator. Overall, sounds really complicated but its actually quite simple. The tutorial has quite a few examples that you will find useful, but in case you need more, I have some on this github repository: [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima] The dictionary and pattern annotators may be similar to what you are looking for (date and city annotators). Best regards, Sujit On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote: Hi Alex, Indeed that is exactly what I am trying to achieve using wordcities. Date will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do I integrate the Java library as UIMA? The documentation about changing schema.xml and solr.xml is not very detailed. Regards, Bart On 8 Feb 2013, at 16:57, Alexandre Rafalovitch arafa...@gmail.com wrote: Hi Bart, I haven't done any UIMA work (I used other stuff for my NLP phase), so not sure I can help much further. But in general, you are venturing into pure research territory here. Even for dates, what do you actually mean? Just fixed expression? Relative dates (e.g. last tuesday?). What about times (7pm?). Same with cities. If you want it offline, you need the gazetteer and disambiguation modules. Gazetteer for cities (worldwide) is huge and has a lot of duplicate names (Paris, Ontario is apparently a short drive from London, Ontario eh?). Something like http://www.maxmind.com/en/worldcities? And disambiguation
How to limit queries to specific IDs
Hi everyone. I have queries that should be bounded to a set of IDs (the uniqueKey field of my schema). My client front-end sends two Solr request: In the first one, it wants to get the top X IDs. This result should return very fast. No time to waste on highlighting. this is a very standard query. In the aecond one, it wants to get the highlighting info (corresponding to the queried fields and terms, of course), on those documents (may be some sequential requests, on small bulks of the full list). These two requests are implemented as almost identical calls, to different requestHandlers. I thought to append a filter query to the second request, id:(1 2 3 4 5). Is this idea good for Solr? If does, my problem is that I don't want these filters to flood my filterCache... Is there any way (even if it involves some coding...) to add a filter query which won't be added to filterCache (at least, not instead of standard filters)? Notes: 1. It can't be assured that the the first query will remain in queryResultsCache... 2. consider index size of 50M documents...
Re: SolrCloud new zookeper node on different ip/ replicate between two clasters
This is good sollution. One thing here is rly unyoing. The double indexing. Is there a way to replicate to another dc? Seams solrcloud cant use his ealier replication. Would be nice if i can replicate somehow between two soulrcloud. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-new-zookeper-node-on-different-ip-replicate-between-two-clasters-tp4039101p4039791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud new zookeper node on different ip/ replicate between two clasters
The replication handler can be setup to replicate to another dc. You can also put nodes in both dcs. Both have plus and minuses vs just sending the same data to both dc's with separate clusters. Where it immediately gets difficult is that you need a quorum of zk nodes to survive if you want to continue handling updates. I have not yet found the multi dc zk solution. I know other systems use something like having a tie breaker node in Europe or something, but I don't know that zk yet supports something like this. In most situations, i think the current best solution is to send data to both dcs. - Mark On Feb 11, 2013, at 2:43 PM, mizayah miza...@gmail.com wrote: This is good sollution. One thing here is rly unyoing. The double indexing. Is there a way to replicate to another dc? Seams solrcloud cant use his ealier replication. Would be nice if i can replicate somehow between two soulrcloud. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-new-zookeper-node-on-different-ip-replicate-between-two-clasters-tp4039101p4039791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problems using distributed More Like This
Eventually, I'll get around to trying some more real world testing. Up till now, no dev seems to have a real interest in this. I have 0 need for it currently, so it's fairly low on my itch scale, but it's on my list anyhow. - Mark On Feb 11, 2013, at 12:26 PM, Shawn Heisey s...@elyograg.org wrote: SOLR-788 added Distributed MLT to Solr 4.1, but I have not been able to get it to work. I don't know if it's user error, which of course is very possible. If it is user error, I'd like to know what I'm doing wrong so I can fix it. I am actually using a recent checkout of Solr 4.2, not the released 4.1. I put some extensive information on SOLR-4414, an issue filed by another user having a similar problem. If you look for the last comment from me on Feb 7 that has a code block, you'll see Solr's response when I use MoreLikeThisComponent. https://issues.apache.org/jira/browse/SOLR-4414 Only the last seven of the query parameters were included on the URL - the rest of them are in solrconfig.xml. Due to echoParams=all, the only part of the request handler definition that you can't see in the response is the fact that last-components contains spellcheck. I redacted the company domain name from the shards and the one document matching the query from the result tag, but there are no other changes to the response. If I send an identical query to the shard core that actually contains the document rather than the core with the shards parameter, I get MLT results. I have heard recently that Solr 4.x has hardcoded the unique field name for SolrCloud sharding as id ... but my uniqueKey field name is tag_id. Could this be my problem? It would be a monumental development effort to change that field name in our application. I am not using SolrCloud for this index. Thanks, Shawn
Re: Can Solr analyze content and find dates and places
Hi Sujit, Thanks for your help! I moved the RoomNumberAnnotator.xml to the top level of the jar and used the same solrconfig.xml (with the /). Now it works perfect. Best regards, Bart On 11 Feb 2013, at 20:13, SUJIT PAL wrote: Hi Bart, Like I said, I didn't actually hook my UIMA stuff into Solr, content and queries are annotated before they reach Solr. What you describe sounds like a classpath problem (but of course you already knew that :-)). Since I haven't actually done what you are trying to do, here are some suggestions, they may or may not work... 1) package up the XML files into your custom JAR at the top level, that way you don't need to specify it as /RoomNumberAnnotator.xml. 2) if you are using solr4, then you should drop your custom JAR into $SOLR_HOME/collection1/lib, not $SOLR_HOME/lib. -sujit On Feb 11, 2013, at 9:40 AM, jazz wrote: Hi Sujit and others who answered my question, I have been working on the UIMA path which seems great with the available Eclipse tooling and this: http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html Now I worked through the UIMA tutorial of the RoomNumberAnnotator: http://uima.apache.org/doc-uima-annotator.html And I am able to test it using the UIMA CAS Virtuall Debugger. So far so good. But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot find the xml file and the Java class (they are in the correct lib directories, because the WhitespaceTokenizer works fine). updateRequestProcessorChain name=uima processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters /lst str name=analysisEngine/RoomNumberAnnotator.xml/str bool name=ignoreErrorsfalse/bool lst name=analyzeFields bool name=mergefalse/bool arr name=fields strcontent/str /arr /lst lst name=fieldMappings lst name=type str name=nameorg.apache.uima.tutorial.RoomNumber/str lst name=mapping str name=featurebuilding/str str name=fieldUIMAname/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it fails: Deploy new jars inside one of the lib directories Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima path. Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch can I checkout? This is the Stable release I am running: Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36 Regards, Bart On 8 Feb 2013, at 22:11, SUJIT PAL wrote: Hi Bart, I did some work with UIMA but this was to annotate the data before it goes to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you will have to set up your own aggregate analysis chain in place of the one currently configured. Writing UIMA annotators is very simple (there is a tutorial here: [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]). You provide the XML description for the annotation and let UIMA generate the annotation bean. You write Java code for the annotator and also the annotator XML descriptor. UIMA uses the annotator XML descriptor to instantiate and run your annotator. Overall, sounds really complicated but its actually quite simple. The tutorial has quite a few examples that you will find useful, but in case you need more, I have some on this github repository: [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima] The dictionary and pattern annotators may be similar to what you are looking for (date and city annotators). Best regards, Sujit On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote: Hi Alex, Indeed that is exactly what I am trying to achieve using wordcities. Date will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do I integrate the Java library as UIMA? The documentation about changing schema.xml and solr.xml is not very detailed. Regards, Bart On 8 Feb 2013, at 16:57, Alexandre Rafalovitch arafa...@gmail.com wrote: Hi Bart, I haven't done any UIMA work (I used other stuff for my NLP phase), so not sure I can help much further. But in general, you are venturing into pure research territory here. Even for dates, what do you actually mean? Just fixed expression? Relative dates (e.g. last tuesday?). What about times (7pm?). Same with cities. If you want it offline, you need the gazetteer and disambiguation
Re: SolrCloud new zookeper node on different ip/ replicate between two clasters
Thx Mark The replication handler can be setup to replicate to another dc. Erm, i dont get it. I can setup replication between two solr cloud this way or just solrcloud-solr? You can also put nodes in both dcs Indexing will slow rly much if I understad well solrcluoud replica and leader (replication is real-time ). Worst is when by accident Zoo will elect leader in other dc. Zoo could use obserwers here bit it will only makes things more comlicated too. I have not yet found the multi dc zk solution. Only smth called obserwers help a bit in my case. Zoo called obserwers dont vote thay are jusr like points of read. It would be good here, but after one dc down i need fully working zoo and obserwers doesnt support to change to follower. About zoo conf is ofc big problem, but configuration doesnt change much so two zoo quorum in bots dc are ok imo. I know other systems use something like having a tie breaker node in Europe Yeah, i want run my own cloud and want to have failover in amazon. I'm from europe :) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-new-zookeper-node-on-different-ip-replicate-between-two-clasters-tp4039101p4039808.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.0 is stripping XML format from RSS content field
Hi, I'm running solr 4.0 final with manifoldcf 1.1 and I verified via fiddler that Manifold is indeed sending the content field from a RSS feed that contains xml data However, when I query the index the content field is there with just the data; the XML structure is gone. Does anyone know how to stop Solr from doing this? I'm using tika but I don't see it in the update/extract handler. Can anyone point me in the right direction? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-is-stripping-XML-format-from-RSS-content-field-tp4039809.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Cloud: Duplicate records while retrieving documents
We are running a six node SOLR cloud which 3 shards and 3 replications. The version of solr cloud is 4.0.0.2012.08.06.22.50.47. We use Python PySolr client to interact with Solr. Documents that we add to solr have a unique id and it can never have duplicates. Our use case is to query the index for a give searchterm and pull all documents that matches the query. Usually our query hits over 40K documents. While we iterate through all 40K+ documents, after few iteration, we see the same documents ids repeated over and over, and at the end we see some 20-33% of the records are duplicates. In the below code snippet after some iterations, we see a difference in the length of idslist and idsset. Any insight into how to troubleshoot this issue is greatly appreciated. from pysolr import Solr solr= Solr('http://solrhost/solr/#/collection1') if __name__ == '__main__': idslist = list() idsset = set() query = 'snow' skip = 0 limit= 500 i = 0 while True: response = solr.search(q=query, rows=limit, start=skip, shards='host1:7575/solr,host2:7575/solr,host3:7575/solr', fl=id,source) if skip == 0: hits = response.hits line = Solr Hits Count: (%s)\n % (hits) print line if len(response.docs) == 0: break for result in response: idslist.append(result['id']) idsset.add(result['id']) if i % 500 == 0: print len(idslist), len(idsset) i+=1 skip += limit -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Duplicate-records-while-retrieving-documents-tp4039776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can Solr analyze content and find dates and places
Cool! Thanks for the update, this will help if I ever go all the way with UIMA and Solr. -sujit On Feb 11, 2013, at 12:13 PM, jazz wrote: Hi Sujit, Thanks for your help! I moved the RoomNumberAnnotator.xml to the top level of the jar and used the same solrconfig.xml (with the /). Now it works perfect. Best regards, Bart On 11 Feb 2013, at 20:13, SUJIT PAL wrote: Hi Bart, Like I said, I didn't actually hook my UIMA stuff into Solr, content and queries are annotated before they reach Solr. What you describe sounds like a classpath problem (but of course you already knew that :-)). Since I haven't actually done what you are trying to do, here are some suggestions, they may or may not work... 1) package up the XML files into your custom JAR at the top level, that way you don't need to specify it as /RoomNumberAnnotator.xml. 2) if you are using solr4, then you should drop your custom JAR into $SOLR_HOME/collection1/lib, not $SOLR_HOME/lib. -sujit On Feb 11, 2013, at 9:40 AM, jazz wrote: Hi Sujit and others who answered my question, I have been working on the UIMA path which seems great with the available Eclipse tooling and this: http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html Now I worked through the UIMA tutorial of the RoomNumberAnnotator: http://uima.apache.org/doc-uima-annotator.html And I am able to test it using the UIMA CAS Virtuall Debugger. So far so good. But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot find the xml file and the Java class (they are in the correct lib directories, because the WhitespaceTokenizer works fine). updateRequestProcessorChain name=uima processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig lst name=runtimeParameters /lst str name=analysisEngine/RoomNumberAnnotator.xml/str bool name=ignoreErrorsfalse/bool lst name=analyzeFields bool name=mergefalse/bool arr name=fields strcontent/str /arr /lst lst name=fieldMappings lst name=type str name=nameorg.apache.uima.tutorial.RoomNumber/str lst name=mapping str name=featurebuilding/str str name=fieldUIMAname/str /lst /lst /lst /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it fails: Deploy new jars inside one of the lib directories Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima path. Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch can I checkout? This is the Stable release I am running: Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36 Regards, Bart On 8 Feb 2013, at 22:11, SUJIT PAL wrote: Hi Bart, I did some work with UIMA but this was to annotate the data before it goes to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe you will have to set up your own aggregate analysis chain in place of the one currently configured. Writing UIMA annotators is very simple (there is a tutorial here: [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]). You provide the XML description for the annotation and let UIMA generate the annotation bean. You write Java code for the annotator and also the annotator XML descriptor. UIMA uses the annotator XML descriptor to instantiate and run your annotator. Overall, sounds really complicated but its actually quite simple. The tutorial has quite a few examples that you will find useful, but in case you need more, I have some on this github repository: [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima] The dictionary and pattern annotators may be similar to what you are looking for (date and city annotators). Best regards, Sujit On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote: Hi Alex, Indeed that is exactly what I am trying to achieve using wordcities. Date will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how do I integrate the Java library as UIMA? The documentation about changing schema.xml and solr.xml is not very detailed. Regards, Bart On 8 Feb 2013, at 16:57, Alexandre Rafalovitch arafa...@gmail.com wrote: Hi Bart, I haven't done any UIMA work (I used other stuff for my NLP phase), so not sure I can help much further. But in general, you are venturing into pure research territory here. Even for dates, what do you actually mean? Just fixed expression? Relative dates (e.g.
Re: Solr Cloud: Duplicate records while retrieving documents
On 2/11/2013 12:09 PM, devb wrote: We are running a six node SOLR cloud which 3 shards and 3 replications. The version of solr cloud is 4.0.0.2012.08.06.22.50.47. We use Python PySolr client to interact with Solr. Documents that we add to solr have a unique id and it can never have duplicates. Our use case is to query the index for a give searchterm and pull all documents that matches the query. Usually our query hits over 40K documents. While we iterate through all 40K+ documents, after few iteration, we see the same documents ids repeated over and over, and at the end we see some 20-33% of the records are duplicates. In the below code snippet after some iterations, we see a difference in the length of idslist and idsset. Any insight into how to troubleshoot this issue is greatly appreciated. For discussion purposes Let's first assume that there are no bugs in Solr. I don't think we can make that assumption, of course. General note 1: Your Solr URL in your code has a # in it. The URLs with # in them are Admin UI URLs. If that's working, I'm amazed... I would take that part of the URL out so that you are pointing at: http://host:port/solr/collection1 General note 2: Paging through that many results with a distributed query (known as deep paging) is SLOW. http://solr.pl/en/2011/07/18/deep-paging-problem/ The first thing I'd do is ask Solr to sort your results. I can see from some google searches that pysolr has sort capability. Once you pick the sort field, I'd probably do the sort ascending, not descending. The default sort is relevance. The next thing to check is whether or not you are updating your index during the time that you are attempting to pull 40,000 documents. If you are, that could completely explain what you are seeing. If you are only adding documents when you update, then you may be able to set a sort parameter that will cause new documents to be at the end of the results, so pagination won't get messed up. If you are deleting documents, then you won't be able to make this work, you'll have to stop your index updates while you pull that many results. After all that, if the problem persists and you are absolutely sure that you don't have duplicate document X on two different shards, then you might be running into a bug. Thanks, Shawn
SolrCloud and hardcoded 'id' field
I have heard that SolrCloud may require the presence of a uniqeKey field specifically named 'id' for sharding. Is this true? Is it still true as of Solr 4.2-SNAPSHOT? If not, what svn commit fixed it? If so, should I file a jira? I am not actually using SolrCloud for one index, but my worry is that once a precedent for putting specific names in the code is set, it may bleed over into other features. Also, I have another set of servers for a different purpose that ARE using SolrCloud. Currently that system uses numShards=1, but one day we might want to do a distributed search there. Both my systems have a uniqueKey field other than 'id' and it would be quite a task to change it. The 'id' field doesn't exist at all in either system. Here's relevant info for one of the systems: field name=tag_id type=lowercase indexed=true stored=true omitTermFreqAndPositions=true/ !-- lowercases the entire field value -- fieldType name=lowercase class=solr.TextField sortMissingLast=true positionIncrementGap=0 omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType uniqueKeytag_id/uniqueKey Thanks, Shawn
Re: Fwd: advice about develop AbstractSolrEventListener.
: I found a solution. I am going to Configured Update Request Processors, : that I have seen in: http://wiki.apache.org/solr/UpdateRequestProcessor Sorry for the late reply, but yes -- an UpdateProcessor seems like the best place to hook in custom functionality if you need to know about individual document adds and commits. -Hoss
Re: Term Frequencies for Query Result
: I am looking for a way to get the top terms for a query result. you have to elaborate on exactly what you mean ... how are you defining top terms for a query result ? Are you talking about hte most common terms in the entire result set of documents that match your query? or the terms from the query that most contributed to the query? or something else? : Faceting does not work since counts are measured as documents containing : a term and not as the overall count of a term in all found documents: ... : Using http://wiki.apache.org/solr/TermVectorComponent an counting all : frequencies manually seems to be the only solution by now: i *think* you are saying that you want the sum of term frequencies for all terms in all matching documents -- but i'm not sure, because i don't see how TermVectorComponent is helping you unless you are iterating over every doc in the result set (ie: deep paging) to get the TermVectors for every doc ... it would help if you could explain what you mean by counting all frequencies manually -Hoss
Re: addSortField throws field not found
: Subject: addSortField throws field not found : : same field name is accepted for addFacetField but throws a field not found ex : for the addSortField method. As a general rule, if you are going to ask a question about an error that you got -- you need to cut/paste the exception (verbatim) into your email ... with the full stack trace. if the error was logged by solr in response to a query, cut/paste the query (verbatim) into your email as well. if the error was thrown in your client code, cut/paste your client code (verbatim) into your email as well. https://wiki.apache.org/solr/UsingMailingLists As things stand, you have provided almost no information that anyone can use to help you here ... my best guess is maybe you are having a jar mismatch .. but that assumes you mean you got a NoSuchMethodError about the addSortField in the SolrJ API ... maybe you mean you got an error from solr about a field not existing in your schema? ... i honestly have no idea. -Hoss
Re: memory leak - multiple cores
Hi Michael, Yes, we do intend to reload Solr when deploying new cores. So we deploy it, update solr.xml and then restart Solr only. So this will happen sometimes in production, but mostly testing. Which means it will be a real pain. Any way to fix this? Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m. Regards, Marcos On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote: Marcos, The later 3 errors are common and won't pose a problem unless you intend to reload the Solr application without restarting Geronimo often. The first error, however, shouldn't happen. Have you changed the size of PermGen at all? I noticed this error while testing Solr 4.0 in Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0, you might want to try upgrading. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote: Hi, I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the following issue and it eats up a lot of memory when shutting down. Has anyone seen this and have an idea how to solve it? Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError: PermGen space 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! instance=2080324477 Regards, Marcos
Re: SolrCloud and hardcoded 'id' field
Doesn't sound right to me. I'd guess you heard wrong. - mark Sent from my iPhone On Feb 11, 2013, at 7:15 PM, Shawn Heisey s...@elyograg.org wrote: I have heard that SolrCloud may require the presence of a uniqeKey field specifically named 'id' for sharding. Is this true? Is it still true as of Solr 4.2-SNAPSHOT? If not, what svn commit fixed it? If so, should I file a jira? I am not actually using SolrCloud for one index, but my worry is that once a precedent for putting specific names in the code is set, it may bleed over into other features. Also, I have another set of servers for a different purpose that ARE using SolrCloud. Currently that system uses numShards=1, but one day we might want to do a distributed search there. Both my systems have a uniqueKey field other than 'id' and it would be quite a task to change it. The 'id' field doesn't exist at all in either system. Here's relevant info for one of the systems: field name=tag_id type=lowercase indexed=true stored=true omitTermFreqAndPositions=true/ !-- lowercases the entire field value -- fieldType name=lowercase class=solr.TextField sortMissingLast=true positionIncrementGap=0 omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType uniqueKeytag_id/uniqueKey Thanks, Shawn
Re: Reverse range query
Hi, I have craeted new attribute(Year) in attribute dictionary and associated with different catentries with different values say 2000,2001,2002,2003,...2012. Now I want to search with the Year attribute with min and max range. when 2000 to 2005 is given as search condition it should fetch the catentries which is between these two values. This is the url I used to hit the solr server. ads_f11001 is the logical name of the attribute year that i have created in management center. This value will be in srchattrprop table. 2000 and 2005 is min and max range. http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{2000 2005} when i try to hit this url i am getting 0 records found. http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{2000 TO *} and http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{* TO 2005} These above two urls ferching me some result but it s not the expected result. Plz help me to solve this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Reverse-range-query-tp1789135p4039860.html Sent from the Solr - User mailing list archive at Nabble.com.
Searching with min and max range in solr
Hi, I have craeted new attribute(Year) in attribute dictionary and associated with different catentries with different values say 2000,2001,2002,2003,...2012. Now I want to search with the Year attribute with min and max range. when 2000 to 2005 is given as search condition it should fetch the catentries which is between these two values. This is the url I used to hit the solr server. ads_f11001 is the logical name of the attribute year that i have created in management center. This value will be in srchattrprop table. 2000 and 2005 is min and max range. http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{2000 2005} when i try to hit this url i am getting 0 records found. http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{2000 TO *} and http://localhost/solr/MC_10701_CatalogEntry_en_US/select?q=ads_f11001:{* TO 2005} These above two urls ferching me some result but it s not the expected result. Plz help me to solve this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-with-min-and-max-range-in-solr-tp4039861.html Sent from the Solr - User mailing list archive at Nabble.com.
Custom search component executed several times when using Zookeeper
We have implemented a custom search component for SOLR which handles security. It simply adds a filter query in the prepare method. This search component is added to our search handler as the last component. The custom function retrieves from a database a list of ACLs attached to the user. When we are running on one instance (a single master), our search component is executed once per request. This is what expected too. But when we are using Zookeeper (two nodes), the same custom component are executed four times per request. This gives a huge overhead and gives a poor performance. Is this normal behavior when using zookeeper or is there any configuration we have overlooked ? Best regards Jens Foshaug, e-vita as -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-search-component-executed-several-times-when-using-Zookeeper-tp4039872.html Sent from the Solr - User mailing list archive at Nabble.com.