Solr Sorting Caching
Our solr index (Solr 3.4) has over 100 million docuemnts. We frequently fire one type of query on this index to get documents, do some processing and dump in another index. Query is of the form - *((keyword1 AND keyword2...) OR (keyword3 AND keyword4...) OR ...) AND date:[date1 TO *]* No. of keywords can be in the range of 100 - 1000. We are adding sort parameter *'date asc'*. The keyword part of the query changes very rarely but date part always changes. Now there are mainly 2 problems, 1) Query takes too much time. 2) Sometimes when 'numFound' is very large for a query, It gives OOM error (I guess this is because of sort). We are not using any type of caching yet. Will caching be helpful to solve these problems? If yes, what type of cache or caching configuration is suitable to start with?
Re: SolrCloud and grouping
Apparently https://issues.apache.org/jira/browse/SOLR-2592 will help you out. Unfortunately, it seems that this issue will not be included into Solr 4.0 release. I'm wondering myself if there are any plans to commit and release this issue or an equivalent, to give users control over partititioning in the SolrCloud? On Tue, Sep 11, 2012 at 7:00 AM, Nikhil Chhaochharia nikhil...@yahoo.com wrote: Hi, I am trying out SolrCloud using a recent Solr 4 nightly. We use result grouping (FieldCollapsing) and found that the value of ngroups returned by Solr is not correct. My understanding is that all the documents belonging to the same group should be on the same shard to ensure that ngroups returns the correct value. However, the shard that a document is sent to is decided automatically based on the value of the uniqueKey field. Is it possible for Solr to hash fieldX instead of the uniqueKey while distributing the documents to the different shards? Is there some other way of getting accurate values of ngroups when using SolrCloud? Thanks, Nikhil
Re: SolrCloud and grouping
Yes, , we will offer something for this - just a matter of priorities for the 4 release. My current priority is heavily bug side at the moment personally. It's more likely in 4.1 or 4.2 or something. - Mark On Tue, Sep 11, 2012 at 2:37 AM, Pavel Goncharik pavel.goncha...@gmail.com wrote: Apparently https://issues.apache.org/jira/browse/SOLR-2592 will help you out. Unfortunately, it seems that this issue will not be included into Solr 4.0 release. I'm wondering myself if there are any plans to commit and release this issue or an equivalent, to give users control over partititioning in the SolrCloud? On Tue, Sep 11, 2012 at 7:00 AM, Nikhil Chhaochharia nikhil...@yahoo.com wrote: Hi, I am trying out SolrCloud using a recent Solr 4 nightly. We use result grouping (FieldCollapsing) and found that the value of ngroups returned by Solr is not correct. My understanding is that all the documents belonging to the same group should be on the same shard to ensure that ngroups returns the correct value. However, the shard that a document is sent to is decided automatically based on the value of the uniqueKey field. Is it possible for Solr to hash fieldX instead of the uniqueKey while distributing the documents to the different shards? Is there some other way of getting accurate values of ngroups when using SolrCloud? Thanks, Nikhil
Re: XInclude Multiple Elements
Way back when I opened an issue about using XML entity includes in Solr as a way to break up the config. I have found problems with XInclude having multiple elements to include because the file is not well formed. From what I have read, if you make this well formed, you end up with a document that's not what you expect. For example: my schema.xml has fields ... xinclude href=more_fields.xml .../ /fields more_fields.xml field name=.. which isn't well formed. You could make it well formed: fields field name =.. /fields but then I think you end up with nested fields element which doesn't work (and btw I still keep getting the blasted failed to parse error which isn't very helpful). Looking at this made me wonder if entity includes work with Solr 4 and indeed they do! They aren't as flexible as XIncludes but for the purpose of breaking up an XML file into smaller pieces, it works beautifully and as you would expect. You can simply declare your entities at the top as shown in the earlier thread and then include them where you need. I've been using this for years and it works fairly well. Cheers! Amit On Thu, May 31, 2012 at 7:01 AM, Bogdan Nicolau bogdan@gmail.com wrote: I've also tried a lot of tricks to get xpointer working with multiple child elements, to no success. In the end, I've resorted to a less pretty, other-way-around solution. I do something like this: solrconfig_common.xml - no xml declaration, no root tag, no nothing etc/etc etc2/etc2 ... For each file that I need the common stuff into, I'd do something like this: solrconfig_master.xml/solrconfig_slave.xml/etc. ?xml version=1.0 encoding=UTF-8 ? !DOCTYPE config [ lt;!ENTITY solrconfigcommon SYSTEM quot;solrconfig_common.xmlquot; ] config solrconfigcommon; /config Solr starts with 0 warnings, the configuration is properly loaded, etc. Property substitution also works, including inside the solrconfig_common.xml. Hope it helps anyone. -- View this message in context: http://lucene.472066.n3.nabble.com/XInclude-Multiple-Elements-tp3167658p3987029.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication policy
If I understand you right, replication of data has 0 downtime, it just works and the data flows through from master to slaves. If you want, you can configure the replication to replicate configuration files across the cluster (although to me my deploy script does this). I'd recommend tweaking the warmers so that you don't get latency spikes due to cold caches during the replications. Not being well versed in the latest Solr features (I'm a bit behind here), I don't know if you can reload the cores on demand to indicate the latest configurations or not but in my environment, I have a rolling restart script that bounces a set of servers when the schema/solrconfig changes. HTH Amit On Mon, Sep 10, 2012 at 11:10 PM, Abhishek tiwari abhishek.tiwari@gmail.com wrote: HI All, am having 1 master and 3 slave solr server.(verson 3.6) What kind of replication policy should i adopt with zero down time no data loss . 1) when we do some configuration and schema changes on the solr server .
Re: Solr Sorting Caching
On Tue, 2012-09-11 at 08:00 +0200, Amey Patil wrote: Our solr index (Solr 3.4) has over 100 million docuemnts. [...] *((keyword1 AND keyword2...) OR (keyword3 AND keyword4...) OR ...) AND date:[date1 TO *]* No. of keywords can be in the range of 100 - 1000. We are adding sort parameter *'date asc'*. Are you using a TrieDateField for the dates? The keyword part of the query changes very rarely but date part always changes. Consider creating and re-using a filter for the keywords and let the query consist of the date range only. [...] 2) Sometimes when 'numFound' is very large for a query, It gives OOM error (I guess this is because of sort). Guessing here: You request all the results from the search, which is potentially 100M documents? Solr is not geared towards such massive responses. You might have better luck by paging, but even that does not behave very well when requesting pages very far into the result set.
Re: RES: RES: Problem with accented words sorting
On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote: When I used the CollationKeyFilterFactory in my facet (example below), the value of facet went wrong. When I remove the CollationKeyFilterFactory of type of facet, the value went correct. As Ahmed wrote, CollationKeyFilter is meant for sorting of the document result. It works by creating a key for each value. The key is, as you discovered, not meant for human eyes. When you do a sort on the collation field, the key is used for ordering and the original human-friendly text is taken from a stored field. See https://wiki.apache.org/solr/UnicodeCollation For faceting, the dual value approach does not work as there are no mapping from the key to the original value. There are several possible solutions to this (storing the original value together with the key seems sensible), but as far as I know, Solr does not currently support collator sorted faceting. Is it a bug? No, it is a known (and significant IMO) limitation.
Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0
feedback to patch: I used build #85 (Revision: 1382192) to test the same use case (build up an initial index of 18 Mio and run updates with around 200.000 documents) result: the use of fq to drill down facets is now consistent! (available under http://sb-tp1.swissbib.unibas.ch) Thanks for providing a quick patch!! -Günter On 09/07/2012 05:09 PM, Erick Erickson wrote: Thank the guys who actually fixed it! Thanks for bringing this up, and please let us know if Yonik's patch fixes your problem Best Erick On Thu, Sep 6, 2012 at 11:39 PM, guenter.hip...@unibas.ch guenter.hip...@unibas.ch wrote: Erick, thanks for response! Our use case is very straight forward and basic. - no cloud infrastructure - XMLUpdateRequest - handler (transformed library bibliographic data which is pushed by the post.jar component). For deletions I used to use the solrJ component until two month ago but because of the difficulties I read about I changed back to the basic procedure with XML documents - around 18 million documents, no distributed shards - once the basic use case is stable and maintainable we are heading forward to the more fancy things ;-) Yonik provided a patch (https://issues.apache.org/jira/browse/SOLR-3793) yesterday morning. I'm going to run tests once it is part of the nightly builds. By now, if I'm not wrong (https://builds.apache.org/job/Solr-Artifacts-4.x/), the last build doesn't contain it. Best wishes from Basel, Günter On 09/07/2012 07:09 AM, Erick Erickson wrote: Guenter: Are you using SolrCloud or straight Solr? And were you updating in batches (i.e. updating multiple docs at once from SolrJ by using the server.add(doclist) form)? There was a bug in this process that caused various docs to show up in various shards differently. This has been fixed in 4x, any nightly build should have the fix. I'm absolutely grasping at straws here, but this was a weird case that I happen to know about... Hossman: of course this all goes up in smoke if you can reproduce this with any recent compilation of the code. FWIW Erick On Wed, Sep 5, 2012 at 11:29 PM, guenter.hip...@unibas.ch guenter.hip...@unibas.ch wrote: Hoss, I'm so happy you realized the problem because I was quite worried about it!! Let me know if I can provide support with testing it. The last two days I was busy with migrating a bunch of hosts which should -hopefully- be finished today. Then I have again the infrastructure for running tests Günter On 09/05/2012 11:19 PM, Chris Hostetter wrote: : Subject: Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0 Günter, This is definitely strange The good news is, i can reproduce your problem. The bad news is, i can reproduce your problem - and i have no idea what's causing it. I've opened SOLR-3793 to try to get to the bottom of this, and included some basic steps to demonstrate the bug using the Solr 4.0-BETA example data, but i'm really not sure what the problem might be... https://issues.apache.org/jira/browse/SOLR-3793 -Hoss -- Universität Basel Universitätsbibliothek Günter Hipler Projekt SwissBib Schoenbeinstrasse 18-20 4056 Basel, Schweiz Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103 e-mailguenter.hip...@unibas.ch URL:www.swissbib.org /http://www.ub.unibas.ch/ -- Universität Basel Universitätsbibliothek Günter Hipler Projekt SwissBib Schoenbeinstrasse 18-20 4056 Basel, Schweiz Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103 e-mailguenter.hip...@unibas.ch URL:www.swissbib.org /http://www.ub.unibas.ch/ -- Universität Basel Universitätsbibliothek Günter Hipler Projekt SwissBib Schoenbeinstrasse 18-20 4056 Basel, Schweiz Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103 e-mailguenter.hip...@unibas.ch URL:www.swissbib.org /http://www.ub.unibas.ch/
Solr backup replication - restore from snapshot
Hello, I have some question about restoring from a snapshot backup. I have a master and do the following command: http://solr.test.uk:/solr/replication?command=backup It created a directory in my data directory: snapshot.20120911224532 When i want to use this backup on master i replace the index directory with the snapshot directory. I restart the master and it works! Now i want to replicate this to my slaves(live) but the slaves doesn't recognize the changes. I think the problem is the index-version. The index version of the master(created from snapshot) is lower than the index versions on the slaves. How can i fix this? Can i force the slaves to replicate without looking to index versions? Can i upgrade the index version on the master? Any help will be appreciated! Roy -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-backup-replication-restore-from-snapshot-tp4006821.html Sent from the Solr - User mailing list archive at Nabble.com.
RES: RES: RES: Problem with accented words sorting
Ok Toke. Thanks for your explanation. This is an interesting feature to be implemented, because we can sort the results correctly, but not in the facets. The facets also does not bring the total count for pagination. I'm using the facets to get the distinct values of a field. I wish to sort and pagination them. -Mensagem original- De: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Enviada em: terça-feira, 11 de setembro de 2012 04:11 Para: solr-user@lucene.apache.org Assunto: Re: RES: RES: Problem with accented words sorting On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote: When I used the CollationKeyFilterFactory in my facet (example below), the value of facet went wrong. When I remove the CollationKeyFilterFactory of type of facet, the value went correct. As Ahmed wrote, CollationKeyFilter is meant for sorting of the document result. It works by creating a key for each value. The key is, as you discovered, not meant for human eyes. When you do a sort on the collation field, the key is used for ordering and the original human-friendly text is taken from a stored field. See https://wiki.apache.org/solr/UnicodeCollation For faceting, the dual value approach does not work as there are no mapping from the key to the original value. There are several possible solutions to this (storing the original value together with the key seems sensible), but as far as I know, Solr does not currently support collator sorted faceting. Is it a bug? No, it is a known (and significant IMO) limitation.
Re: Re: Get parent when the child is a search hit
Hi, Thanks for all the suggestions :-) Seems like denormalization is the way to go to do this without losing scalability and speed. BlockJoins seems so solve another requirement I have, and that is for the parent-child relationship between for instance an email and email attachments. This relationship is more stable, it does not change so often, so block joins seems like a good approach. I see support for block joins is not committed to Solr yet, the functionality only exists in Lucene. Does anyone know if the block joins functionality in SOLR-3076 will be committed to Solr before Solr 4 is released? Best, Stein Gran On Tue, Sep 11, 2012 at 6:29 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, One more approach is BlockJoin. see SOLR-3076 blog.griddynamics.com/2012/08/block-join-query-performs.html 11.09.2012 5:40 пользователь 李�S liyun2...@corp.netease.com написал: I think denormalize the data is the best way. 2012-09-11 李�S 发件人:jimtronic 发送时间:2012-09-11 01:38 主题:Re: Get parent when the child is a search hit 收件人:solr-usersolr-user@lucene.apache.org 抄送: You could create a type field with folder or file as values and then have the parentid present in the folder docs. -- View this message in context: http://lucene.472066.n3.nabble.com/Get-parent-when-the-child-is-a-search-hit-tp4006623p4006687.html Sent from the Solr - User mailing list archive at Nabble.com.
Return only matched multiValued field
Assuming a multivalued, stored and indexed field with name comment. When performing a search, I would like to return only the values of comment which contain the match. For example: When searching for gold instead of getting this result: doc arr name=comment strTheres a lady whos sure/str strall that glitters is gold/str strand shes buying a stairway to heaven/str /arr /doc I would prefer to get this result: doc arr name=comment strall that glitters is gold/str /arr /doc (psuedo-XML from memory, may not be accurate but illustrates the point) Thanks. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: RES: RES: RES: Problem with accented words sorting
On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote: This is an interesting feature to be implemented, because we can sort the results correctly, but not in the facets. At work (State and University Library, Denmark) we use collator-ordered faceting for author title, but out current implementation suffers from sorting upon index-open time. Roughly speaking this takes one minute per one million terms and since we have 10M documents, we're talking 10-15 minutes before a search can be performed. The collator-key+original term-approach would take nearly the same time as standard index order faceting when opening the index. The facets also does not bring the total count for pagination. I'm using the facets to get the distinct values of a field. I wish to sort and pagination them. This seems to be the relevant JIRA issue: https://issues.apache.org/jira/browse/SOLR-2242
RES: RES: RES: RES: Problem with accented words sorting
Ok Toke, Is it worth opening a ticket in jira to implement the collator-key + original in facet? -Mensagem original- De: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Enviada em: terça-feira, 11 de setembro de 2012 08:46 Para: solr-user@lucene.apache.org Assunto: Re: RES: RES: RES: Problem with accented words sorting On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote: This is an interesting feature to be implemented, because we can sort the results correctly, but not in the facets. At work (State and University Library, Denmark) we use collator-ordered faceting for author title, but out current implementation suffers from sorting upon index-open time. Roughly speaking this takes one minute per one million terms and since we have 10M documents, we're talking 10-15 minutes before a search can be performed. The collator-key+original term-approach would take nearly the same time as standard index order faceting when opening the index. The facets also does not bring the total count for pagination. I'm using the facets to get the distinct values of a field. I wish to sort and pagination them. This seems to be the relevant JIRA issue: https://issues.apache.org/jira/browse/SOLR-2242
Solr 3.6.1 Source Code
Hi, I would like to know the base lined version of Solr 3.6.1 Source code for svn Check out. We tried to check out from the following link and found many base lined versions related to Solr 3.6.x version. https://svn.apache.org/repos/asf/lucene/dev/branches/ Can anyone tell me the exact svn check out link for Solr 3.6.1 version? Thanks a Lot -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-1-Source-Code-tp4006903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.6.1 Source Code
On Tue, Sep 11, 2012 at 2:43 PM, mechravi25 mechrav...@yahoo.co.in wrote: Hi, I would like to know the base lined version of Solr 3.6.1 Source code for svn Check out. We tried to check out from the following link and found many base lined versions related to Solr 3.6.x version. https://svn.apache.org/repos/asf/lucene/dev/branches/ Can anyone tell me the exact svn check out link for Solr 3.6.1 version? Thanks a Lot https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_1/
Re: Solr 3.6.1 Source Code
The branch will be the source for the next release (3.6.2), if there is one. To get the exact source for a release, go to tags rather than branches. Use: http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_6_1/ -- Jack Krupansky -Original Message- From: mechravi25 Sent: Tuesday, September 11, 2012 8:43 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 Source Code Hi, I would like to know the base lined version of Solr 3.6.1 Source code for svn Check out. We tried to check out from the following link and found many base lined versions related to Solr 3.6.x version. https://svn.apache.org/repos/asf/lucene/dev/branches/ Can anyone tell me the exact svn check out link for Solr 3.6.1 version? Thanks a Lot -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-1-Source-Code-tp4006903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: RES: RES: RES: Problem with accented words sorting
This is an interesting feature to be implemented, because we can sort the results correctly, but not in the facets. The facets also does not bring the total count for pagination. I'm using the facets to get the distinct values of a field. I wish to sort and pagination them. Distinct values can be retrieved using http://wiki.apache.org/solr/LukeRequestHandler too. Regarding pagination : http://wiki.apache.org/solr/SimpleFacetParameters#facet.offset
Re: Running Solr Unit Test in Eclipse
Generally, source folders come in pairs - java for the actual code source, and test for the unit tests. So, make sure that /test appears in the source folder name. And unit test file names either begin or end with Test. If you right click on a unit test and select Run As, you should see JUnit Test. Or press Ctrl+F11 to run a unit test. -- Jack Krupansky -Original Message- From: BadalChhatbar Sent: Tuesday, September 11, 2012 1:37 AM To: solr-user@lucene.apache.org Subject: Running Solr Unit Test in Eclipse Hi All, I am new to solr and eclipse. I am trying to run solr unit test in eclipse, and i am getting confused at couple of places. ( Note: I am able to run test using ant command and it all works fine). but when I open unit test, and go to Right Click --- Run As Configuration option , it shows me to run unit test as Jetty WebApp , is this the right thing ? if yes, then it would be great if you can provide me some configuration steps. and if i try to select Run As Configuration to Junit, then i m not sure what classes to select or what arguments to specify. I tried to follow this document, but it didn't help much. http://wiki.apache.org/solr/TestingSolr -- View this message in context: http://lucene.472066.n3.nabble.com/Running-Solr-Unit-Test-in-Eclipse-tp4006795.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Solr4 beta] error 503 on commit
Hoss, After investigating more, here is the tomcat log herebelow. It is indeed the same problem: exceeded limit of maxWarmingSearchers=2,. It is an indexing box and the comment says that we could rise this number to 4 or something. I can do that but I have four questions though: - is it something that can happen anyway ? - what are the actions in case it does (since re-committing the same docs with a try-again mechanism is not re-committing) ? - I was in the impression that no new searchers are created if I don't do search. And I was not searching at that time. Where is this searcher coming from ? - is there a way to disable searchers during indexing since I precisely don't want warming during indexing ? Thanks a lot. 11 sept. 2012 15:25:08 org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. 11 sept. 2012 15:25:08 org.apache.solr.core.SolrCore registerSearcher INFO: [lg_fr_alpha6_full] Registered new searcher Searcher@61f5e795main{StandardDirectoryReader(segments_qo:13989:nrt _13d(4.0.0.1):C4636752/2696195 _26s(4.0.0.1):C4422269 _329(4.0.0.1):C2409200 _3me(4.0.0.1):C4534687/1 _4df(4.0.0.1):C3745599/731 _4np(4.0.0.1):C3660400/5317 _4qr(4.0.0.1):C2422935/138 _4w4(4.0.0.1):C261418 _4qw(4.0.0.1):C293154 _56y(4.0.0.1):C833572 _4vc(4.0.0.1):C138593 _53b(4.0.0.1):C410764 _4yg(4.0.0.1):C168744 _51j(4.0.0.1):C134313 _56s(4.0.0.1):C151121 _55m(4.0.0.1):C40342 _59i(4.0.0.1):C167117 _58i(4.0.0.1):C82564 _57r(4.0.0.1):C79488 _57k(4.0.0.1):C2589 _5a7(4.0.0.1):C64667 _59x(4.0.0.1):C14142 _58v(4.0.0.1):C1618 _58x(4.0.0.1):C2219 _595(4.0.0.1):C15193 _590(4.0.0.1):C17855 _598(4.0.0.1):C3528 _59a(4.0.0.1):C5998 _59l(4.0.0.1):C10234 _59k(4.0.0.1):C6524 _59q(4.0.0.1):C3115 _5al(4.0.0.1):C3502 _5a4(4.0.0.1):C1602 _5a6(4.0.0.1):C1797 _5a5(4.0.0.1):C1530 _5ad(4.0.0.1):C7351 _5ac(4.0.0.1):C5797 _5ab(4.0.0.1):C8330 _5aa(4.0.0.1):C6436 _5a9(4.0.0.1):C5944 _5ag(4.0.0.1):C36424 _5aj(4.0.0.1):C7 _5ai(4.0.0.1):C26770 _5ak(4.0.0.1):C29729 _5af(4.0.0.1):C35554 _5ah(4.0.0.1):C23804)} 11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush 11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [lg_fr_alpha6_full] webapp=/solr path=/update params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2} {commit=} 0 107492 11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false} 11 sept. 2012 15:25:08 org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@62677672 main 11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush 11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [lg_fr_alpha6_full] webapp=/solr path=/update params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2} {commit=} 0 93563 11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false} 11 sept. 2012 15:25:08 org.apache.solr.core.SolrCore getSearcher ATTENTION: [lg_fr_alpha6_full] PERFORMANCE WARNING: Overlapping onDeckSearchers=2 11 sept. 2012 15:25:08 org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@4479f66a main 11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush 11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [lg_fr_alpha6_full] webapp=/solr path=/update params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2} {commit=} 0 93178 11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false} 11 sept. 2012 15:25:08 org.apache.solr.core.SolrCore getSearcher ATTENTION: [lg_fr_alpha6_full] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. 11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [lg_fr_alpha6_full] webapp=/solr path=/update params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2} {} 0 93137 11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false} 11 sept. 2012 15:25:08 org.apache.solr.core.SolrCore getSearcher ATTENTION: [lg_fr_alpha6_full] Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. 11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [lg_fr_alpha6_full] webapp=/solr path=/update params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2} {} 0 90799 11 sept. 2012 15:25:08
Solr on https
Hi, We are trying to run solr on https, these are few of the issues or problems that are coming up. Just wanted to understand if anyone else is facing these problems, we have some shards running on https, but in shards parameter in solr we don't specify the protocol, how can we achieve thisWill replication work on https Will commit and other functions working normally? Regards,atpug
Re: RES: RES: RES: RES: Problem with accented words sorting
On Tue, 2012-09-11 at 14:21 +0200, Claudio Ranieri wrote: Ok Toke, Is it worth opening a ticket in jira to implement the collator-key + original in facet? I think it would be best to discuss it on the developer mailing list first. I have send a mail there: Collator-based facet sorting in Solr. Regards, Toke Eskildsen
Re: Facet Sort by Index, missing indexes
: I did the query twice, once with the sorting and once without the sort: : : 1. Without f.a.facet.sort=index : I have all l1, l2, l3 in count order : (all l1, l2, and l3 facets have counts on them) : : 2. With f.a.facet.sort=index : The facet is sorted accordingly : l1:..,l2:.. but l3 facets are completely missing first off: terminology clariication. It sounds like in your case you have a field named a and you are using that field as a field facet. within the field a you have terms like l1, l2, l3, etc... when you facet on a field, the terms are each treated as a constraint and you get a constraint count (or facet count) for each term. Having said that: it sounds like what you are describing is that some constraints are missing from the list when you use facet.sort=index (ie: when the constraints are in index order) Is your example real? ie: are the terms really l1, l2, and l3 or are those just hypothetical examples? how many terms do you see in the response? - because by default a max of 100 constraints are returned... https://wiki.apache.org/solr/SimpleFacetParameters#facet.limit It would help if you could provide a full, real example of the request you are attempting (with all params) and the actaul response you get back - if things appaer to be working with facet.sort=count, but not with facet.sort=index, then please show us both requests+responses so we can compare. -Hoss
Re: Semantic document format... standards?
I'm probably a little unclear about the breadth of what you want to do, but I would recommend DC at the extremely lightweight end, and TEI at the very heavyweight end. Perhaps you could come up with a mashup of DC and your own fields in RDF as well. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, If I'm extracting named entities, topics, key phrases/tags, etc. from documents and I want to have a representation of this document, what format should I use? Are there any standard or at least common formats or approaches people use in such situations? For example, the most straight forward format might be something like this: document titledoc title/title keywordsmeta keywords coming from the web page/keywords contentpage meat/content entitiesname entities recognized in the document/entities topicstopics extracted by the annotator/topics tagstags extracted by the annotator/tags relationsrelations extracted by the annotator/relations /document But this is a made up format - the XML tags above are just what somebody happened to pick. Are there any standard or at least common formats for this? Thanks, Otis Performance Monitoring - Solr - ElasticSearch - HBase - http://sematext.com/spm Search Analytics - http://sematext.com/search-analytics/index.html
Re: Use field as bool flag for another, not indexed, field
You've outlined the possibilities pretty well. I don't think you want a custom analyzer though, consider a custom UpdateHandler and overriding the addDoc command. You can freely manipulate the document at this point, adding or removing fields etc. So see if the incoming doc has your original field or not and add your new boolean field at that point... Or, even simpler, if you're indexing from SolrJ do this on the client side. Best Erick On Sun, Sep 9, 2012 at 2:53 PM, simple350 aurel...@yahoo.com wrote: Hi, I want to be able to select from the index the documents who have a certain field not null. The problem is that the field is not indexed just stored. I'm not interested in indexing that field as it is just an internal URL. The idea was to add another field to the document - a boolean field - based on the initial field: 'True' for exiting field, 'False' for null - I could copy the initial field and use some analyzer having as output a bool result. Before trying to build a custom analyzer I wanted to ask if anything like this makes sense or if it is already available in Solr or if I completely missed some point. Regards, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Use-field-as-bool-flag-for-another-not-indexed-field-tp4006491.html Sent from the Solr - User mailing list archive at Nabble.com.
PatternTokenizerFactory not working to split comma separated value
Hello , I am using following field type for comma separated value but it is not working. fieldType name=commaDelimited class=solr.TextField analyzer tokenizer class=solr.PatternTokenizerFactory group=-1 pattern=,|\| / /analyzer /fieldType field indexed=true multiValued=true name=vc_cat_shape omitNorms=true omitPositions=true omitTermFreqAndPositions=true stored=false termVectors=false type=commaDelimited/ Please suggested what i did wrong. - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/PatternTokenizerFactory-not-working-to-split-comma-separated-value-tp4006994.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr request/response lifecycle and logging full response time
: I'd still love to see a query lifecycle flowchart, but, in case it : helps any future users or in case this is still incorrect, here's how : I'm tackling this: part of the problem is deciding what you mean by lifecycle - as far as the SolrCore is concerned, the lifecycle of hte request is it's execute method -- after that there is still respons writing, but SolrCore doesn't really care about that. From the perspective of SolrDispatchFilter, the lifecycle is longer, as the REsponseWRiter is used to format the response. From the perspective of the servlet container, the lifecycle might be even longer, as the client may be slow to read bytes off the wire, so the SolrDispatchFilter ResponseWriter may be completley done with the response, but the ServletContainer may not yet have written all the bytes back to the client. That's why most people looking for the full response time usually get this info from the ServletContainer (logs), because it's the only place that knows for certain when the request is *DONE* done. : Please advise if: : - Flowcharts for any solr/lucene-related lifecycles exist Jan made a pretty decent stab at this a while back, which is good for an end user perspective but isn't super detailed... http://www.cominvent.com/nb/2011/04/04/solr-architecture-diagram/ : - There is a better way of doing this If i were attempting to solve this problem, i probably would have tried to implement it as a simple Servlet Filter that wrapped the ServletOutputStream in somethng that would have done hte logging on close(). (so that it could be re-used by any ResponseWriter) -Hos
RE: Term searches with colon(:)
: Thank you for the reply and help. The description field is part of the : defaultHandler's eDisMax search list (qf): ... : Similar queries for other escaped characters in description using term : search return correctly as shown from the logs correctly. Ok ... but you haven't really answered my main question -- what are you trying to match? are you trying to search for the literal term *:* in the description field? are you trying to do a wildcard search for terms that contain a colon in the middle (ie: foo:bar), are you trying to match all documents in the description field? you've said it doesn't match anything, but you haven't explain what you expect it to match (Actually .. lemme back up and ask a silly question -- are the * characters in your email actaully part of the query you are sending to solr, or is that just an artifact of your mail client translating bold or highlighted characters into * when converting to plain text?) : what are you expecting that query to match? because by backslash : escpaing the colon, what you are asking for there is for Solr to search : for the literal string *:* in your default search field (afterwhatever : query time analysis is configured on your default search field) -Hoss
Re: SolrCloud vs SolrReplication
Thanks for the answer Erick. thaihai -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-SolrReplication-tp4006327p4007019.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PatternTokenizerFactory not working to split comma separated value
I tried your field type in Solr 4.0-BETA and it works fine for input such as: cat,dog|fox,bat|frog What do you see when you use the Solr Admin Analysis web page for that text for your field type? I would note that your pattern does not permit spaces as delimiters or after delimiters, so if your input had spaces, queries could fail unless they included the escaped spaces. -- Jack Krupansky -Original Message- From: Suneel Sent: Tuesday, September 11, 2012 2:20 PM To: solr-user@lucene.apache.org Subject: PatternTokenizerFactory not working to split comma separated value Hello , I am using following field type for comma separated value but it is not working. fieldType name=commaDelimited class=solr.TextField analyzer tokenizer class=solr.PatternTokenizerFactory group=-1 pattern=,|\| / /analyzer /fieldType field indexed=true multiValued=true name=vc_cat_shape omitNorms=true omitPositions=true omitTermFreqAndPositions=true stored=false termVectors=false type=commaDelimited/ Please suggested what i did wrong. - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/PatternTokenizerFactory-not-working-to-split-comma-separated-value-tp4006994.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: PatternTokenizerFactory not working to split comma separated value
Hi Jack, This is happening on only lucid cloud server not splitting comma separated value. but on solr server this fix is working perfectly. Is any configuration changes in solrconfig.xml which can enable and disable PatternTokenizerFactory? - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/PatternTokenizerFactory-not-working-to-split-comma-separated-value-tp4006994p4007028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Semantic document format... standards?
Hi, I guess the most common format today is using the schema.org's ontologies. It provides a couple of definitions, and it is supported by big players, such as Google, Yahoo, Microsoft. See http://schema.org/. Hope it helps, Péter otis_gospodne...@yahoo.com wrote: Hello, If I'm extracting named entities, topics, key phrases/tags, etc. from documents and I want to have a representation of this document, what format should I use? Are there any standard or at least common formats or approaches people use in such situations? For example, the most straight forward format might be something like this: document titledoc title/title keywordsmeta keywords coming from the web page/keywords contentpage meat/content entitiesname entities recognized in the document/entities topicstopics extracted by the annotator/topics tagstags extracted by the annotator/tags relationsrelations extracted by the annotator/relations /document But this is a made up format - the XML tags above are just what somebody happened to pick. Are there any standard or at least common formats for this? Thanks, Otis Performance Monitoring - Solr - ElasticSearch - HBase - http://sematext.com/spm Search Analytics - http://sematext.com/search-analytics/index.html On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic -- Péter Király eXtensible Catalog http://eXtensibleCatalog.org http://drupal.org/project/xc
Re: PatternTokenizerFactory not working to split comma separated value
Is it possible that you may have indexed with an earlier pattern, changed the pattern, and then tried to query? If so, you need to fully re-index to see the change take effect. I don't know of anything in solrconfig that should affect PatternTokenizerFactory. -- Jack Krupansky -Original Message- From: Suneel Pandey Sent: Tuesday, September 11, 2012 4:09 PM To: solr-user@lucene.apache.org Subject: Re: PatternTokenizerFactory not working to split comma separated value Hi Jack, This is happening on only lucid cloud server not splitting comma separated value. but on solr server this fix is working perfectly. Is any configuration changes in solrconfig.xml which can enable and disable PatternTokenizerFactory? - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/PatternTokenizerFactory-not-working-to-split-comma-separated-value-tp4006994p4007028.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting more proper results
Hi, i'm using solr 3.5 with the following configuration: fieldType name=text_auto class=solr.TextField analyzer type=index !--tokenizer class=solr.KeywordTokenizerFactory/-- tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25 / filter class=solr.LowerCaseFilterFactory/ !--filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 /-- /analyzer analyzer type=query !--tokenizer class=solr.KeywordTokenizerFactory /-- tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25 / /analyzer /fieldType fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=id type=string indexed=true stored=true required=true / field name=title type=text_auto indexed=true stored=true multiValued=false/ field name=description type=text indexed=true stored=true multiValued=false/ field name=title_autocomplete type=text_auto indexed=true stored=true multiValued=false/ field name=pic_thumb type=text indexed=true stored=true multiValued=false/ field name=category type=text indexed=true stored=true multiValued=false/ field name=category_id type=int indexed=true stored=true multiValued=false/ field name=top_id type=int intdexed=true stored=true multiValued=false/ I'm using it that way, because I want to have an autocompletion list. Now I'm wondering if I can influence some of the results I'm getting. I have a lot of categories in my database. If I for example search for iphone 3 I would expect to get all iphone 3 from the category electronic. If I'm searching for iphone 3 I get the following results iPhone - the book , Apple iphone 4, My iphone I , etc. If I instead write iphone 3g, then I get the proper results. a lot of iphones . Why did my first search didn't give me the results that I'm getting with the second search term? I would expect that the behavior should be the same. Is it possible to configure solr in such a way, that he returns with the first searchterm the results of the second operation? Thanks, Rk
Re: Semantic document format... standards?
As Michael hinted, I believe RDF would be the de-factor answer. Within it, things such as OWL or SKOS certainly represent classical formats. Processors such as OWLAPI can go pretty far there. As Péter hinted, schema.org might provide a way to complement an existing XML with semantic information. The big support everyone talks about (because apparently the big names speak out), I haven't seen yet very present; in particular in terms of shared toolset. There's many many many alternatives. We've been recently touching with a format call DITA which is an XML format for annotated documents and it also claims to provide semantic support (e.g. with taxonomies). Is your goal to serve these as food for solr to index? Paul Le 11 sept. 2012 à 17:51, Otis Gospodnetic a écrit : Hello, If I'm extracting named entities, topics, key phrases/tags, etc. from documents and I want to have a representation of this document, what format should I use? Are there any standard or at least common formats or approaches people use in such situations? For example, the most straight forward format might be something like this: document titledoc title/title keywordsmeta keywords coming from the web page/keywords contentpage meat/content entitiesname entities recognized in the document/entities topicstopics extracted by the annotator/topics tagstags extracted by the annotator/tags relationsrelations extracted by the annotator/relations /document But this is a made up format - the XML tags above are just what somebody happened to pick. Are there any standard or at least common formats for this?
multiple filter queries and boolean operators in SolrJ
Hi, I am accessing our Solr installation via SolrJ. Currently, we are supporting filter queries via the addFilterQuery() method of SolrQuery. However as far as I can see, the resultant documents that come out of the query are the intersection of all the filters. Ideally, what I'd like to happen is that if we have two FQ's on the same field, the result should be the OR, whereas if we have two FQ's on different fields it should be AND. The thread at http://lucene.472066.n3.nabble.com/complex-boolean-filtering-in-fq-queries-td2038365.html seems to suggest that I could do this by constructing a URL manually. But can this be done via SolrJ? -- Rajarshi Guha | http://blog.rguha.net NIH Center for Advancing Translational Science
What's the rules about contributing to Solr WIKI?
I just figured out how to run custom solr core with basic jetty under Windows service with Apache procrun. Not quite a production setup and most probably not perfect, but it might save somebody several hours next time. I want to contribute that back to Solr WIKI for next person. Do I just wade in and start editing or is there a process/coordinator? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: What's the rules about contributing to Solr WIKI?
: Subject: What's the rules about contributing to Solr WIKI? https://wiki.apache.org/solr/#How_to_edit_this_Wiki This Wiki is a collaborative site, anyone can contribute and share: Create an account by clicking the Login link at the top of any page, and picking a username and password. Edit any page by pressing Edit at the top or the bottom of the page If people feel that your contributions should be edited or re-organized, they will do so. if conflicts of vision arise, people bring them up on the mailing list. -Hoss
Re: Getting more proper results
I'm using it that way, because I want to have an autocompletion list. Now I'm wondering if I can influence some of the results I'm getting. I have a lot of categories in my database. If I for example search for iphone 3 I would expect to get all iphone 3 from the category electronic. If I'm searching for iphone 3 I get the following results iPhone - the book , Apple iphone 4, My iphone I , etc. You can try to set your default operator to and. q.op=AND If you are not firing a phrase query (with quotes), you can do that too. q=iphone 3
Re: multiple filter queries and boolean operators in SolrJ
--- On Wed, 9/12/12, Rajarshi Guha rajarshi.g...@gmail.com wrote: From: Rajarshi Guha rajarshi.g...@gmail.com Subject: multiple filter queries and boolean operators in SolrJ To: solr-user@lucene.apache.org Date: Wednesday, September 12, 2012, 12:58 AM Hi, I am accessing our Solr installation via SolrJ. Currently, we are supporting filter queries via the addFilterQuery() method of SolrQuery. However as far as I can see, the resultant documents that come out of the query are the intersection of all the filters. Ideally, what I'd like to happen is that if we have two FQ's on the same field, the result should be the OR, whereas if we have two FQ's on different fields it should be AND. The thread at http://lucene.472066.n3.nabble.com/complex-boolean-filtering-in-fq-queries-td2038365.html seems to suggest that I could do this by constructing a URL manually. But can this be done via SolrJ? Does this work for you? addFilterQuery(field1:(term1 OR term2)); addFilterQuery(fiel2:term5);
Re: multiple filter queries and boolean operators in SolrJ
fq's are always intersections, if you want to union multiple queries you have to specify them in a single fq -- that's not a SolrJ/URL thing, that's just a low level detail of how solr caches intersects filters. from SolrJ you just have to do a single addFilterQuery() call containing your union query (using whatever query parser you choose) There's been an open issue for a while talking about the logistics of making unioned fq's more feasible. I recently posted some thoughts there on what i *think* would be a fairly straight forward way to do support this in a relatively non-invasive and robust way, which you may want to look at if you are comfortable working with java and would like to try your hand at implementing it in solr... https://issues.apache.org/jira/browse/SOLR-1223 https://issues.apache.org/jira/browse/SOLR-1223?focusedCommentId=13450929page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13450929 -Hoss
Re: multiple filter queries and boolean operators in SolrJ
Perfect! Many thanks Sent from my HTC One™ X, an ATT 4G LTE smartphone - Reply message - From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org, rajarshi.g...@gmail.com Subject: multiple filter queries and boolean operators in SolrJ Date: Tue, Sep 11, 2012 6:36 PM --- On Wed, 9/12/12, Rajarshi Guha rajarshi.g...@gmail.com wrote: From: Rajarshi Guha rajarshi.g...@gmail.com Subject: multiple filter queries and boolean operators in SolrJ To: solr-user@lucene.apache.org Date: Wednesday, September 12, 2012, 12:58 AM Hi, I am accessing our Solr installation via SolrJ. Currently, we are supporting filter queries via the addFilterQuery() method of SolrQuery. However as far as I can see, the resultant documents that come out of the query are the intersection of all the filters. Ideally, what I'd like to happen is that if we have two FQ's on the same field, the result should be the OR, whereas if we have two FQ's on different fields it should be AND. The thread at http://lucene.472066.n3.nabble.com/complex-boolean-filtering-in-fq-queries-td2038365.html seems to suggest that I could do this by constructing a URL manually. But can this be done via SolrJ? Does this work for you? addFilterQuery(field1:(term1 OR term2)); addFilterQuery(fiel2:term5);
Re: Semantic document format... standards?
My standard question for such a situation: How are you expecting your users to query this data? Are they expecting simple English/natural language text, or are they expecting structured identifiers that can be keys into other data sources. For example, are your entities simple text literal names, or might they be Dublin Core (DC) Agent URI identifiers? Ditto for topics - free text vs. some SKOS vocabulary or other form of taxonomy. In other words, clue us in as to your client requirements. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Tuesday, September 11, 2012 11:51 AM To: solr-user@lucene.apache.org Subject: Semantic document format... standards? Hello, If I'm extracting named entities, topics, key phrases/tags, etc. from documents and I want to have a representation of this document, what format should I use? Are there any standard or at least common formats or approaches people use in such situations? For example, the most straight forward format might be something like this: document titledoc title/title keywordsmeta keywords coming from the web page/keywords contentpage meat/content entitiesname entities recognized in the document/entities topicstopics extracted by the annotator/topics tagstags extracted by the annotator/tags relationsrelations extracted by the annotator/relations /document But this is a made up format - the XML tags above are just what somebody happened to pick. Are there any standard or at least common formats for this? Thanks, Otis Performance Monitoring - Solr - ElasticSearch - HBase - http://sematext.com/spm Search Analytics - http://sematext.com/search-analytics/index.html
solr.StrField with stored=true useless or bad?
Hi, I have a StrField to store an URL. The field definition looks like this: field name=link type=string indexed=true stored=true required=true / Type string is defined as usual: fieldType name=string class=solr.StrField sortMissingLast=true / Then I realized that a StrField doesn't execute any analyzers and stored data verbatim. The data is just a single token. The purpose of stored=true is to store the raw string data besides the analyzed/transformed data for displaying purposes. This is fine for an analyzed solr.TextField, but for an StrField both values are the same. So is there any reason to apply stored=true on a StrField as well? I ask, because I found a lot of sites and tutorials applying stored=true on StrFields as well. Do they all to it wrong or am I missing something here?
Re: solr.StrField with stored=true useless or bad?
On Tue, Sep 11, 2012 at 7:03 PM, sy...@web.de wrote: The purpose of stored=true is to store the raw string data besides the analyzed/transformed data for displaying purposes. This is fine for an analyzed solr.TextField, but for an StrField both values are the same. So is there any reason to apply stored=true on a StrField as well? You're over-thinking things a bit ;-) if you want to search on it: index it If you want to return it in search results: store it Those are two orthogonal things (even for StrField). Why? Indexed means full-text inverted index: words (terms) point to documents. It's not easy/fast for a given document to find out what terms point to it. Stored fields are all stored together and can be retrieved together given a document id. Hence search finds lists of document ids (via indexed fields), and can then return any of the stored fields for those document ids. -Yonik http://lucidworks.com
Running Luke On Solr Index (getting lock error)
Hi All, I am trying to run luke on my solr search index ( I have contributed 3-4 xml files only). when I try to open indexes in luke, I am getting write.lock error on index and its not showing me index. I did check Force Unlock option, but it didn't help either, I also tried opening index under read only mode. didn't work :(. I have attached screenshot of the error. ( Note : I am using Luckall-0.9.9.jar). do i need to select any specific option while opening index in luke. http://lucene.472066.n3.nabble.com/file/n4007084/solrIndex.jpg -- View this message in context: http://lucene.472066.n3.nabble.com/Running-Luke-On-Solr-Index-getting-lock-error-tp4007084.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.StrField with stored=true useless or bad?
The purpose of stored=true is to store the raw string data besides the analyzed/transformed data for displaying purposes. This is fine for an analyzed solr.TextField, but for an StrField both values are the same. So is there any reason to apply stored=true on a StrField as well? If you don't store it, you cannot retrieve it (displaying purposes) via fl= parameter. You can access indexed values via faceting, terms component etc.
Re: Solr search not working after copying a new field to an existing Indexed Field
Eric, When you add a doc with the same unique key as an old doc, the data associated with the first version of the doc is entirely thrown away and its as though you'd never indexed it at all, I did exactly the same. The old doc and new doc there is not a change except the Name has changed. When I query Solr for the document, I do see the Name field with the correct recent changes. However if I search for for the new name, I do not get the result. So I removed all the documents entirely and then added the same new document. It worked. Not sure if this is a bug. So whenever I add a new field to an existing search field, the document needs to be thrown away (not just adding with the same key as its not working in my case) for the search to take effect. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-search-not-working-after-copying-a-new-field-to-an-existing-Indexed-Field-tp4005993p4007096.html Sent from the Solr - User mailing list archive at Nabble.com.
Partial search
I have three documents with the following search field (text_en type) values. When I search for Energy Field, I am getting the document in this order presented. However if you look at the match, I would expect the Doc3 should come first and Doc1 should be the last. Doc1 : Automic Energy and Peace Doc2 : Energy One Energy Two Energy Three Energy Four Doc3 : Mathematic Field Energy Field What is the best way to configure my search to accomodate as many terms match as possible? -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-search-tp4007097.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud fail over
I know fail over is available in solr4.0 right now, if one server crashes,other servers also support query,I set up a solr cloud like this http://lucene.472066.n3.nabble.com/file/n4007117/Selection_028.png I use http://localhost:8983/solr/collection1/select?q=*%3A*wt=xml for query at first, if the node 8983 crashes, I have to access other nodes for query like http://localhost:8900/solr/collection1/select?q=*%3A*wt=xml but I use the nodes url in the solrj, how to change the request url dynamically? does SolrCloud support something like virtual ip address? for example I use url http://collections1 in the solrj, and forward the request to available url automatically. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-fail-over-tp4007117.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Term searches with colon(:)
Sorry for not being clear. Yes, I am trying to do a wildcard search for terms that contain a colon in the text (ie: foo:bar) in the filed list mentioned in the default requesthandler that I posted earlier. Description is one of those fields. Mpg is another field. I have not included the entire default field list for brevity's sake. The *s in my queries that I have included are part of the actual solr query (to denote wildcards as you said earlier). Hope I am clear this time. Thank you again for your help. Raj -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Tuesday, September 11, 2012 3:10 PM To: solr-user@lucene.apache.org Subject: RE: Term searches with colon(:) : Thank you for the reply and help. The description field is part of the : defaultHandler's eDisMax search list (qf): ... : Similar queries for other escaped characters in description using term : search return correctly as shown from the logs correctly. Ok ... but you haven't really answered my main question -- what are you trying to match? are you trying to search for the literal term *:* in the description field? are you trying to do a wildcard search for terms that contain a colon in the middle (ie: foo:bar), are you trying to match all documents in the description field? you've said it doesn't match anything, but you haven't explain what you expect it to match (Actually .. lemme back up and ask a silly question -- are the * characters in your email actaully part of the query you are sending to solr, or is that just an artifact of your mail client translating bold or highlighted characters into * when converting to plain text?) : what are you expecting that query to match? because by backslash : escpaing the colon, what you are asking for there is for Solr to search : for the literal string *:* in your default search field (afterwhatever : query time analysis is configured on your default search field) -Hoss
RE: Facet Sort by Index, missing indexes
Hi Chris, Thanks for that tip, I checked and it is correctly because of the constraint limit. Thanks again Dewi -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, 12 September 2012 1:57 AM To: solr-user@lucene.apache.org Subject: Re: Facet Sort by Index, missing indexes : I did the query twice, once with the sorting and once without the sort: : : 1. Without f.a.facet.sort=index : I have all l1, l2, l3 in count order : (all l1, l2, and l3 facets have counts on them) : : 2. With f.a.facet.sort=index : The facet is sorted accordingly : l1:..,l2:.. but l3 facets are completely missing first off: terminology clariication. It sounds like in your case you have a field named a and you are using that field as a field facet. within the field a you have terms like l1, l2, l3, etc... when you facet on a field, the terms are each treated as a constraint and you get a constraint count (or facet count) for each term. Having said that: it sounds like what you are describing is that some constraints are missing from the list when you use facet.sort=index (ie: when the constraints are in index order) Is your example real? ie: are the terms really l1, l2, and l3 or are those just hypothetical examples? how many terms do you see in the response? - because by default a max of 100 constraints are returned... https://wiki.apache.org/solr/SimpleFacetParameters#facet.limit It would help if you could provide a full, real example of the request you are attempting (with all params) and the actaul response you get back - if things appaer to be working with facet.sort=count, but not with facet.sort=index, then please show us both requests+responses so we can compare. -Hoss
Solr 4.0-BETA facet pivot returns no result
I use the Solr 4.0-BETA version, my request url is http://localhost:8983/solr/collection1/select?q=*%3A*rows=0wt=xmlfacet.pivot=cat,popularity,inStockfacet.pivot=popularity,catfacet=truefacet.field=catfacet.pivot.mincount=0 but I do not get any facet pivot info in the result lst name=params str name=facettrue/str str name=q*:*/str str name=facet.fieldcat/str str name=facet.pivot.mincount0/str str name=wtxml/str arr name=facet.pivot strcat,popularity,inStock/str strpopularity,cat/str /arr str name=rows0/str /lst /lst result name=response numFound=32 start=0 maxScore=1.0/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=cat int name=electronics14/int int name=currency4/int int name=memory3/int int name=connector2/int int name=graphics card2/int int name=hard drive2/int int name=monitor2/int int name=search2/int int name=software2/int int name=camera1/int int name=copier1/int int name=multifunction printer1/int int name=music1/int int name=printer1/int int name=scanner1/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst Does any body know the reason ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-BETA-facet-pivot-returns-no-result-tp4007133.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr.StrField with stored=true useless or bad?
This is great thanks for this post! I was curious about the same thing and was wondering why fl couldn't return the indexed representation of a field if that field were only indexed but not stored. My thoughts were return something than nothing but I didn't pay attention to the fact that getting even the indexed representation of a field given a document is not fast. Thanks Amit On Tue, Sep 11, 2012 at 4:03 PM, sy...@web.de wrote: Hi, I have a StrField to store an URL. The field definition looks like this: field name=link type=string indexed=true stored=true required=true / Type string is defined as usual: fieldType name=string class=solr.StrField sortMissingLast=true / Then I realized that a StrField doesn't execute any analyzers and stored data verbatim. The data is just a single token. The purpose of stored=true is to store the raw string data besides the analyzed/transformed data for displaying purposes. This is fine for an analyzed solr.TextField, but for an StrField both values are the same. So is there any reason to apply stored=true on a StrField as well? I ask, because I found a lot of sites and tutorials applying stored=true on StrFields as well. Do they all to it wrong or am I missing something here?
Re: Solr - Lucene Debuging help
The wiki should probably be updated.. maybe I'll take a stab at it. I'll also try and update my article referenced there too. When you checkout the project from SVN, do ant eclipse Look at this bug (https://issues.apache.org/jira/browse/SOLR-3817) and either run the ruby program or download the patch and apply but either way it should fix the classpath issues. Then import the project and you can follow the remainder of the steps in the http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse article. Cheers Amit On Mon, Sep 10, 2012 at 1:29 PM, BadalChhatbar badal...@yahoo.com wrote: Hi Steve, Thanks, I was able to create new project using that url. :) one more thing,.. its giving me about 32K error. (something like.. this type cannot be resolved). i tried rebuilding project and running ant command (build.xml) . but it didn't help. any suggestions on this ? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Lucene-Debuging-help-tp4006715p4006721.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: In-memory indexing
I have wondered about this too but instead why not just set your cache sizes large enough to house most/all of your documents and pre-warm the caches accordingly? My bet is that a large enough document cache may suffice but that's just a guess. - Amit On Mon, Sep 10, 2012 at 10:56 AM, Kiran Jayakumar kiranjuni...@gmail.com wrote: Hi, Does anyone have any experience in hosting the entire index in a RAM disk ? (I'm not thinking about Lucene's RAM directory). I have some small indexes (less than a Gb). Also, please recommend a good RAM disk application for Windows (I have used Gizmo, wondering if there's any better one out there). Thanks Kiran
Re: Solr Sorting Caching
Are you using a TrieDateField for the dates? Yes Consider creating and re-using a filter for the keywords and let the query consist of the date range only. In this case, do I have to configure any cache or solr's default configurations are enough? Guessing here: You request all the results from the search, which is potentially 100M documents? Solr is not geared towards such massive responses. You might have better luck by paging, but even that does not behave very well when requesting pages very far into the result set. We have implemented paging but the problem is sort. Solr try to sort the dates of all the documents satisfying that query, hence if the numFound is very large, solr loads all the date values in the memory so as to sort them and hence goes OOM. Correct me if I am wrong. On Tue, Sep 11, 2012 at 12:30 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Tue, 2012-09-11 at 08:00 +0200, Amey Patil wrote: Our solr index (Solr 3.4) has over 100 million docuemnts. [...] *((keyword1 AND keyword2...) OR (keyword3 AND keyword4...) OR ...) AND date:[date1 TO *]* No. of keywords can be in the range of 100 - 1000. We are adding sort parameter *'date asc'*. Are you using a TrieDateField for the dates? The keyword part of the query changes very rarely but date part always changes. Consider creating and re-using a filter for the keywords and let the query consist of the date range only. [...] 2) Sometimes when 'numFound' is very large for a query, It gives OOM error (I guess this is because of sort). Guessing here: You request all the results from the search, which is potentially 100M documents? Solr is not geared towards such massive responses. You might have better luck by paging, but even that does not behave very well when requesting pages very far into the result set.
Re: Semantic document format... standards?
Otis, if you have a bit of time to research, I think your document may look a lot like the documents processed by: http://langtech.jrc.it/ which is a flagship multilingual technology implementation and includes a fair amount of entity disambiguation as far as I could hear in Ralph's talk. I do not have a more concrete pointer, sorry, and I would love to read something more concretely closer to solr about them. Paul Le 12 sept. 2012 à 00:46, Jack Krupansky a écrit : My standard question for such a situation: How are you expecting your users to query this data? Are they expecting simple English/natural language text, or are they expecting structured identifiers that can be keys into other data sources. For example, are your entities simple text literal names, or might they be Dublin Core (DC) Agent URI identifiers? Ditto for topics - free text vs. some SKOS vocabulary or other form of taxonomy. In other words, clue us in as to your client requirements. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Tuesday, September 11, 2012 11:51 AM To: solr-user@lucene.apache.org Subject: Semantic document format... standards? Hello, If I'm extracting named entities, topics, key phrases/tags, etc. from documents and I want to have a representation of this document, what format should I use? Are there any standard or at least common formats or approaches people use in such situations? For example, the most straight forward format might be something like this: document titledoc title/title keywordsmeta keywords coming from the web page/keywords contentpage meat/content entitiesname entities recognized in the document/entities topicstopics extracted by the annotator/topics tagstags extracted by the annotator/tags relationsrelations extracted by the annotator/relations /document But this is a made up format - the XML tags above are just what somebody happened to pick. Are there any standard or at least common formats for this? Thanks, Otis Performance Monitoring - Solr - ElasticSearch - HBase - http://sematext.com/spm Search Analytics - http://sematext.com/search-analytics/index.html
AW: Getting more proper results
Hi, i've set it to AND, restarted tomcat, but in my search i get the same results. So it seems that this don't have an effect. Any ideas? Ramo -Ursprüngliche Nachricht- Von: Ahmet Arslan [mailto:iori...@yahoo.com] Gesendet: Mittwoch, 12. September 2012 00:34 An: solr-user@lucene.apache.org Betreff: Re: Getting more proper results I'm using it that way, because I want to have an autocompletion list. Now I'm wondering if I can influence some of the results I'm getting. I have a lot of categories in my database. If I for example search for iphone 3 I would expect to get all iphone 3 from the category electronic. If I'm searching for iphone 3 I get the following results iPhone - the book , Apple iphone 4, My iphone I , etc. You can try to set your default operator to and. q.op=AND If you are not firing a phrase query (with quotes), you can do that too. q=iphone 3