Re: Separate ACL and document index
I've been read much about Document Level Security https://issues.apache.org/jira/browse/SOLR-1895 https://issues.apache.org/jira/browse/SOLR-1872 https://issues.apache.org/jira/browse/SOLR-1834 But I not fully sure that these patch solved my problem? It seems to that change the original document ACL will need to re-build index with document content. It make no sense to rebuild when I only change ACL. Have any idea? Or I just misunderstanding these patch? Floyd 2011/11/23 Floyd Wu floyd...@gmail.com: Hi there, Is it possible to separate ACL index and document index and achieve to search by user role in SOLR? Currently my implementation is to index ACL with document, but the document itself change frequently. I have to perform rebuild index every time when ACL change. It's heavy for whole system due to document are so many and content are huge. Do you guys have any solution to solve this problem. I've been read mailing list for a while. Seem there is not suitable solution for me. I want user searches result only for him according to his role but I don't want to re-index document every time when document's ACL change. To my knowledge, is this possible to perform a join like database to achieve this? How and possible? Thanks Floyd
Re: date range in solr 3.1
i works now great thanks for you :) -- View this message in context: http://lucene.472066.n3.nabble.com/date-range-in-solr-3-1-tp3527498p3530038.html Sent from the Solr - User mailing list archive at Nabble.com.
Sort question
Hi I have a query where i sort by a column price. This field can contain the following values 10 75000 15 1 225000 50 40 I want to sort these values so that always between 0 and 100 always comes last. Eg sorting by price asc should look like this: 75000 10 15 225000 1 40 50 Is this possible? -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-question-tp3530070p3530070.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection Distribution vs Replication in Solr
Indeed, I can not see any of the 3 images here : http://wiki.apache.org/solr/SolrReplication#Admin_Page_for_Replication It just displays the name of image file, as the img url seem to point to a logged-only link such as this one : http://wiki.apache.org/solr/SolrReplication?action=AttachFiledo=gettarget=replication.png Is that oversight or by design, to force people to log into the wiki ? André Bois-Crettez Alireza Salimi wrote: I can't see those benchmarks, can you? On Thu, Oct 27, 2011 at 5:20 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Replication is easier to manage and a bit faster. See the performance numbers: http://wiki.apache.org/solr/SolrReplication -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html Sent from the Solr - User mailing list archive at Nabble.com. Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Search on multiple fields is not working
Hi, I have two indexed fields called profileId and tagName.When i issue a query like q=profileId:99964 OR profileId:10076 OR tagName:MUSIC AND DESIGNER, i am getting only the results for tagName:MUSIC AND DESIGNER.The results are not containing profileId 99964 and 10076. Can anybody tell what i am doing wrong? Regards, Siva -- View this message in context: http://lucene.472066.n3.nabble.com/Search-on-multiple-fields-is-not-working-tp3530145p3530145.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Integrating Surround Query Parser
After this i tried with solr3.1-src. - and this time i got the core folder in the previous installation ,when this folder get created - /home/reach121/basf/*apache-solr-3.1.0/core/src/test/org/apache/solr/search/TestSurroundQueryParser.java * - and i have putted * queryParser name=surround class=org.apache.solr.search.SurroundQParserPlugin /* - but when i run solr , it is giving me an error : - - SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.search.SurroundQParserPlugin' - at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:389) - at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:423) - at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:445) - at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1545) - at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1539) - at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1572) - at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1489) - at org.apache.solr.core.SolrCore.init(SolrCore.java:555) - at org.apache.solr.core.CoreContainer.create(CoreContainer.java:458) - at org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) - at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) - at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) - at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) - at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) - at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) - at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) - Please suggest what should i do ? On Wed, Nov 23, 2011 at 11:19 AM, Rahul Mehta rahul23134...@gmail.comwrote: This what i tried: - Gone to the solr 3.1 directory which is downloaded from here. http://www.trieuvan.com/apache//lucene/solr/3.1.0/apache-solr-3.1.0.tgz - wget https://issues.apache.org/jira/secure/attachment/12493167/SOLR-2703.patch - run the : patch -p0 -i SOLR-2703.patch --dry-run - got an error : - patching file core/src/test/org/apache/solr/search/TestSurroundQueryParser.java - patching file core/src/test-files/solr/conf/schemasurround.xml - patching file core/src/test-files/solr/conf/solrconfigsurround.xml - patching file core/src/java/org/apache/solr/search/SurroundQParserPlugin.java - patching file example/solr/conf/solrconfig.xml - Hunk #1 FAILED at 1538. - 1 out of 1 hunk FAILED -- saving rejects to file example/solr/conf/solrconfig.xml.rej - our solr config file is getting end at 1508 only. - tried finding sudo find / -name TestSurroundQueryParser.java which is not found in the directory . - and when m doing svn up giving me Skipped '.' *Please suggest what should i do now ? * On Wed, Nov 23, 2011 at 10:39 AM, Rahul Mehta rahul23134...@gmail.comwrote: How to apply this patch https://issues.apache.org/jira/browse/SOLR-2703 with solr 3.1 to install surround as plugin? On Tue, Nov 22, 2011 at 7:34 PM, Erik Hatcher erik.hatc...@gmail.comwrote: The surround query parser is fully wired into Solr trunk/4.0, if that helps. See http://wiki.apache.org/solr/SurroundQueryParser and the JIRA issue linked there in case you want to patch it into a different version. Erik On Jan 21, 2011, at 02:24 , Ahson Iqbal wrote: Hi All I want to integrate Surround Query Parser with solr, To do this i have downloaded jar file from the internet and and then pasting that jar file in web-inf/lib and configured query parser in solrconfig.xml as queryParser name=SurroundQParser class=org.apache.lucene.queryParser.surround.parser.QueryParser/ now when i load solr admin page following exception comes org.apache.solr.common.SolrException: Error Instantiating QParserPlugin, org.apache.lucene.queryParser.surround.parser.QueryParser is not a org.apache.solr.search.QParserPlugin what i think that i didnt get the right plugin, can any body guide me from where to get right plugin for surround query parser or how to accurately integrate this plugin with solr. thanx Ahsan -- Thanks Regards Rahul Mehta -- Thanks Regards Rahul Mehta -- Thanks Regards Rahul Mehta
DIH Strange Problem
I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line Creating a connection for entity. There are no further messages after that. I can see that DIH is busy and on the DIH console I can see A command is still running, I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser password=password/ document . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1322041133719 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity SampleText with URL: jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
Re: Search on multiple fields is not working
you probably wanted to query this: q=profileId:99964 OR profileId:10076 OR tagName:(MUSIC AND DESIGNER) otherwise SOLR matches DESIGNER against you default field (whatever it is) and ANDs it with tagName:MUSIC On Wed, Nov 23, 2011 at 11:07 AM, sivaprasad sivaprasa...@echidnainc.comwrote: Hi, I have two indexed fields called profileId and tagName.When i issue a query like q=profileId:99964 OR profileId:10076 OR tagName:MUSIC AND DESIGNER, i am getting only the results for tagName:MUSIC AND DESIGNER.The results are not containing profileId 99964 and 10076. Can anybody tell what i am doing wrong? Regards, Siva -- View this message in context: http://lucene.472066.n3.nabble.com/Search-on-multiple-fields-is-not-working-tp3530145p3530145.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan
Re: date range in solr 3.1
what i got is the number of this period but i want to get this result only, what is the query to can get that like fq=source:news -- View this message in context: http://lucene.472066.n3.nabble.com/date-range-in-solr-3-1-tp3527498p3530424.html Sent from the Solr - User mailing list archive at Nabble.com.
How to configure /select handler ?
Another newbie question here Browse handler works perfect. Now I want to configure my /select handler so that I perform ajax-solr on it. How to perform it. The website https://github.com/evolvingweb/ajax-solr https://github.com/evolvingweb/ajax-solr explains how to perform it. I want to do the same by configuring my /sect handler or I should create a new handler? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-configure-select-handler-tp3530493p3530493.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Search for misspelled search term
Do you mean stemming? For misspelled words you will have to edit your dictionary (stopwords.txt) i think where you can set solution for misspelled words! Hope So :) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-for-misspelled-search-term-tp3529961p3530504.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH Strange Problem
Hi Yavar, my experience with similar problems was that there was something wrong with the database connection or the database. Chantal On Wed, 2011-11-23 at 11:57 +0100, Husain, Yavar wrote: I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line Creating a connection for entity. There are no further messages after that. I can see that DIH is busy and on the DIH console I can see A command is still running, I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser password=password/ document . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1322041133719 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity SampleText with URL: jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders /PRE BR **BRThis message may contain confidential or proprietary information intended only for the use of theBRaddressee(s) named above or may contain information that is legally privileged. If you areBRnot the intended addressee, or the person responsible for delivering it to the intended addressee,BRyou are hereby notified that reading, disseminating, distributing or copying this message is strictlyBRprohibited. If you have received this message by mistake, please immediately notify us byBRreplying to the message and delete the original message and any copies immediately thereafter.BR BR Thank you.~BR **BR FAFLDBR PRE
Re: Solr Search for misspelled search term
I have configured specllchecker component in my solrconfig below is the configuration requestHandler name=/spellcheck class=solr.SearchHandler lazy=true lst name=defaults str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler using above configuration it works with below url http://192.168.1.59:8080/solr/core0/spellcheck?q=sc:directryspellcheck=truespellcheck.build=true But when i set the same config in my standard request handler then i dont work, below is config setting for that requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=spellcheck.onlyMorePopularfalse/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler then its not working with below url http://192.168.1.59:8080/solr/core0/select?q=sc:directryspellcheck=truespellcheck.build=true. anybody have any idea? neuron005 wrote Do you mean stemming? For misspelled words you will have to edit your dictionary (stopwords.txt) i think where you can set solution for misspelled words! Hope So :) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-for-misspelled-search-term-tp3529961p3530526.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to make effective search with fq and q params
Hi Erik , Actually right now we can say that almost is done in filtering and passing q as *:* , but we need to find out a better way if there is any. So according to pravesh , i m thinking of to pass user entered text in query and date and other fields in filter query? or as per you q=*:* is fast? I have below fields to search Search Term : User Entered Text Field (passing it in q) Title : User Entered Text Field (passing it in fq) Desc : User Entered Text Field (passing it in fq) Appearing : User Entered Text Field (passing it in fq) Date Range : (passing it in fq) Time Zone : (EST , CST ,MST , PST) (passing it in fq) Category : (multiple choice) (passing it in fq) Market : (multiple choice) (passing it in fq) Affiliate Network : (multiple choice) (passing it in fq) I really appreciate your view. Meghana Jeff Schmidt wrote Hi Erik: When using [e]dismax, does configuring q.alt=*:* and not specifying q affect the performance/caching in any way? As a side note, a while back I configured q.alt=*:*, and the application (via SolrJ) still set q=*:* if no user input was provided (faceting). With both of them set that way, I got zero results. (Solr 3.4.0) Interesting. Thanks, Jeff On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote: If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jeff Schmidt 535 Consulting jas@ http://www.535consulting.com (650) 423-1068 -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3529876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: wild card search and lower-casing
Ah, I see what you're doing, go for it. I intend to commit it today, but things happen. About changing the setLowerCaseExpandedTerms(true), yes that'll take care of this issue, although it has some locale-specific assumptions (i.e. string.toLowerCase() uses the default locale). That may not matter in your situation though. Best Erick On Tue, Nov 22, 2011 at 10:46 AM, Dmitry Kan dmitry@gmail.com wrote: Thanks, Erick. I was in fact reading the patch (the one attached as a file to the aforementioned jira) you updated sometime yesterday. I'll watch the issue, but as said the change of a hard-coded boolean to its opposite worked just fine for me. Best, Dmitry On 11/22/11, Erick Erickson erickerick...@gmail.com wrote: No, no, no That's something buried in Lucene, it has nothing to do with the patch! The patch has NOT yet been applied to any released code. You could pull the patch from the JIRA and apply it to trunk locally if you wanted. But there's no patch for 3.x, I'll probably put that up over the holiday. But things have changed a bit (one of the things I'll have to do is create some documentation). You *should* be able to specify just legacyMultiTerm=true in your fieldType if you want to apply the 3.x patch to pre 3.6 code. It would be a good field test if that worked for you. But you can't do any of this until the JIRA (SOLR-2438) is marked Resolution: Fixed. Don't be fooled by Fix Version. Fix Version simply says that those are the earliest versions it *could* go in. Best Erick Best Erick On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote: I guess, I have found your comment, thanks. For our current needs I have just set: setLowercaseExpandedTerms(true); // changed from default false in the SolrQueryParser's constructor and that seem to work so far. In order not to start a separate thread on wildcards. Is it so, that for the trailing wildcard there is a minimum of 2 preceding characters for a search to happen? Dmitry On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.comwrote: It may be. The tricky bit is that there is a constant governing the behavior of this that restricts it to 3.6 and above. You'll have to change it after applying the patch for this to work for you. Should be trivial, I'll leave a note in the code about this, look for SOLR-2438 in the 3x code line for the place to change. On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote: Thanks Erick. Do you think the patch you are working on will be applicable as well to 3.4? Best, Dmitry On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.com wrote: As it happens I'm working on SOLR-2438 which should address this. This patch will provide two things: The ability to define a new analysis chain in your schema.xml, currently called multiterm that will be applied to queries of various sorts, including wildcard, prefix, range. This will be somewhat of an expert thing to make yourself... In the absence of an explicit definition it'll synthesize a multiterm analyzer out of the query analyzer, taking any char fitlers, and lowercaseFilter (if present), and ASCIIFoldingfilter (if present) and putting them in the multiterm analyzer along with a (hardcoded) WhitespaceTokenizer. As of 3.6 and 4.0, this will be the default behavior, although you can explicitly define a field type parameter to specify the current behavior. The reason it is on 3.6 is that I want it to bake for a while before getting into the wild, so I have no intention of trying to get it into the 3.5 release. The patch is up for review now, I'd like another set of eyeballs or two on it before committing. The patch that's up there now is against trunk but I hope to have a 3x patch that I'll apply to the 3x code line after 3.5 RC1 is cut. Best Erick On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote: You're right: public SolrQueryParser(IndexSchema schema, String defaultField) { ... setLowercaseExpandedTerms(false); ... } Please note that lowercaseExpandedTerms uses String.toLowercase() (uses default Locale) which is a Locale sensitive operation. In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if it is ported to solr. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html -- Regards, Dmitry Kan
need a way so that solr return result for misspelled terms
Hi, I have configured spellchecker component in my solr. it works with custom request handler (however its not working with standard request handler , but this is not concern at now) . but its returning suggestions for the matching spells, instead of it we want that we can directly get result for relative spells of misspelled search term. Can we do this. Any help much appreciated. Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html Sent from the Solr - User mailing list archive at Nabble.com.
Huge Performance: Solr distributed search
Hi! * Data: - Solr 3.4; - 30 shards ~ 13GB, 27-29M docs each shard. * Machine parameters (Ubuntu 10.04 LTS): user@Solr:~$ uname -a Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux user@Solr:~$ cat /proc/cpuinfo processor : 0 - 3 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5690 @ 3.47GHz stepping: 2 cpu MHz : 3458.000 cache size : 12288 KB fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat bogomips: 6916.00 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: user@Solr:~$ cat /proc/meminfo MemTotal: 16992680 kB MemFree: 110424 kB Buffers:9976 kB Cached: 11588380 kB SwapCached:41952 kB Active: 9860764 kB Inactive:6198668 kB Active(anon):4062144 kB Inactive(anon): 398972 kB Active(file):5798620 kB Inactive(file): 5799696 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 46873592 kB SwapFree: 46810712 kB Dirty:36 kB Writeback: 0 kB AnonPages: 4424756 kB Mapped: 940660 kB Shmem:40 kB Slab: 362344 kB SReclaimable: 350372 kB SUnreclaim:11972 kB KernelStack:2488 kB PageTables:68568 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:55369932 kB Committed_AS:5740556 kB VmallocTotal: 34359738367 kB VmallocUsed: 350532 kB VmallocChunk: 34359384964 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M:17299456 kB - Apache Tomcat 6.0.32: !-- java arguments -- -XX:+DisableExplicitGC -XX:PermSize=512M -XX:MaxPermSize=512M -Xmx12G -Xms3G -XX:NewSize=128M -XX:MaxNewSize=128M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9 -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 -verbose:gc -XX:+PrintGCTimeStamps -Xloggc:/opt/search/tomcat/logs/gc.log Out search schema is: - 5 servers with configuration above; - one tomcat6 application on each server with 6 solr applications. - Full addresses are: 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,...,http://192.168.1.85:8080/solr6 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,...,http://192.168.1.86:8080/solr12 ... 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,...,http://192.168.1.89:8080/solr30 - At another server there is a additional common application with shards paramerter: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,...,192.168.1.89:8080/solr30/str int name=rows10/int /lst /requestHandler - schema and solrconfig are identical for all shards, for first shard see attach; - on these servers are only search, indexation is on another (optimized to 2 segments shards replicate with ssh/rsync scripts). So now the major problem is huge performance on distributed search. Take look on, for example, these logs: This is on 30 shards: INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000} status=0 QTime=40712 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000} status=0 QTime=36097 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000} status=0 QTime=75756 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(blessing's)rows=2000} status=0 QTime=30342 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(reiterated)rows=2000} status=0 QTime=55690 Sometimes QTime is more than 15. But when we run identical queries on one shard separately, QTime is between 200 and 1500. Does ditributed solr search really slow or our architecture is non optimal? Or maybe need to use any third-party applications? Thanks for any replies. -- Best regards, Artem
Re: how to make effective search with fq and q params
Thanks, Erik. I'm moving on to edismax, and will set q.alt=*:* and not specify q if no user provided terms. Take it easy, Jeff On Nov 22, 2011, at 11:53 AM, Erik Hatcher wrote: I think you're using dismax, not edismax. edismax will take q=*:* just fine as it handles all Lucene syntax queries also. dismax does not. So, if you're using dismax and there is no actual query (but you want to get facets), you set q.alt=*:* and omit q - that's entirely by design. If there's a non-empty q parameter, q.alt is not considered so there shouldn't be any issues with always have q.alt set if that's what you want. Erik On Nov 22, 2011, at 11:15 , Jeff Schmidt wrote: Hi Erik: It's not in the SolrJ library, but rather my use of it: In my application code: protected static final String SOLR_ALL_DOCS_QUERY = *:*; /* * If no search terms provided, then return all neighbors. * Results are to be returned in neighbor symbol alphabetical order. */ if (searchTerms == null) { searchTerms = SOLR_ALL_DOCS_QUERY; nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc); } So, if no user search terms are provided, I search all documents (there are other fqs in effect) and return them in name order. That worked just fine. Then I read more about [e]dismax, and went and configured: str name=q.alt*:*/str Then I would get zero results. It's not a SolrJ issue though, as this request in my browser also resulted in zero results: http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type That was due to the q=*:*. Once I set, say, q=cancer, I got results. So I guess this is a [e]dismax thing? (partner-tmo is the name of my request handler). I solved my problem by net setting *:* in my application, and left q.alt=*:* in place. Hope this helps. Again, this is stock Solr 3.4.0, running the Apache war under Tomcat 6. Jeff On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote: On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote: When using [e]dismax, does configuring q.alt=*:* and not specifying q affect the performance/caching in any way? No different than using q=*:* with the lucene query parser. MatchAllDocsQuery is possibly the fastest query out there! (it simply matches documents in index order, all scores are 1.0) As a side note, a while back I configured q.alt=*:*, and the application (via SolrJ) still set q=*:* if no user input was provided (faceting). With both of them set that way, I got zero results. (Solr 3.4.0) Interesting. Ouch. Really? I don't see in the code (looking at my trunk checkout) where there's any *:* used in the SolrJ library. Can you provide some details on how you used SolrJ? It'd be good to track this down as that seems like a bug to me. Erik Thanks, Jeff On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote: If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068 -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068 -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068
Re: need a way so that solr return result for misspelled terms
Meghana - There's currently no facility in Solr to return results for suggestions automatically. You'll have to code this into your client to make another request to Solr for the suggestions returned from the first request. Erik On Nov 23, 2011, at 07:58 , meghana wrote: Hi, I have configured spellchecker component in my solr. it works with custom request handler (however its not working with standard request handler , but this is not concern at now) . but its returning suggestions for the matching spells, instead of it we want that we can directly get result for relative spells of misspelled search term. Can we do this. Any help much appreciated. Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to make effective search with fq and q params
Jeff, Just to clarify - with edismax, q=*:* is fine and matches all documents. With dismax (and also edismax), q.alt with no q is needed to match all documents. Erik On Nov 23, 2011, at 08:20 , Jeff Schmidt wrote: Thanks, Erik. I'm moving on to edismax, and will set q.alt=*:* and not specify q if no user provided terms. Take it easy, Jeff On Nov 22, 2011, at 11:53 AM, Erik Hatcher wrote: I think you're using dismax, not edismax. edismax will take q=*:* just fine as it handles all Lucene syntax queries also. dismax does not. So, if you're using dismax and there is no actual query (but you want to get facets), you set q.alt=*:* and omit q - that's entirely by design. If there's a non-empty q parameter, q.alt is not considered so there shouldn't be any issues with always have q.alt set if that's what you want. Erik On Nov 22, 2011, at 11:15 , Jeff Schmidt wrote: Hi Erik: It's not in the SolrJ library, but rather my use of it: In my application code: protected static final String SOLR_ALL_DOCS_QUERY = *:*; /* * If no search terms provided, then return all neighbors. * Results are to be returned in neighbor symbol alphabetical order. */ if (searchTerms == null) { searchTerms = SOLR_ALL_DOCS_QUERY; nodeQuery.addSortField(n_name, SolrQuery.ORDER.asc); } So, if no user search terms are provided, I search all documents (there are other fqs in effect) and return them in name order. That worked just fine. Then I read more about [e]dismax, and went and configured: str name=q.alt*:*/str Then I would get zero results. It's not a SolrJ issue though, as this request in my browser also resulted in zero results: http://localhost:8091/solr/ing-content/select/?qt=partner-tmofq=type%3Anodefq=n_neighborof_id%3AING\:afaq=*:*rows=5facet=truefacet.mincount=1facet.field=n_neighborof_processExactfacet.field=n_neighborof_edge_type That was due to the q=*:*. Once I set, say, q=cancer, I got results. So I guess this is a [e]dismax thing? (partner-tmo is the name of my request handler). I solved my problem by net setting *:* in my application, and left q.alt=*:* in place. Hope this helps. Again, this is stock Solr 3.4.0, running the Apache war under Tomcat 6. Jeff On Nov 22, 2011, at 8:05 AM, Erik Hatcher wrote: On Nov 22, 2011, at 09:55 , Jeff Schmidt wrote: When using [e]dismax, does configuring q.alt=*:* and not specifying q affect the performance/caching in any way? No different than using q=*:* with the lucene query parser. MatchAllDocsQuery is possibly the fastest query out there! (it simply matches documents in index order, all scores are 1.0) As a side note, a while back I configured q.alt=*:*, and the application (via SolrJ) still set q=*:* if no user input was provided (faceting). With both of them set that way, I got zero results. (Solr 3.4.0) Interesting. Ouch. Really? I don't see in the code (looking at my trunk checkout) where there's any *:* used in the SolrJ library. Can you provide some details on how you used SolrJ? It'd be good to track this down as that seems like a bug to me. Erik Thanks, Jeff On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote: If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068 -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068 -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068
Re: how to make effective search with fq and q params
Meghana - Some important points about q/fq - * q is used for scoring. fq is for filtering, no scoring. * fq and q are cached independently You may want to combine the user entered terms (search term, title, and desc) in the q parameter. It's complicated/advanced, but you can use nested queries to achieve a spread of different query contexts with different field configurations. Check out Yonik's blog entry for inspiration: http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/ Erik On Nov 23, 2011, at 00:59 , meghana wrote: Hi Erik , Actually right now we can say that almost is done in filtering and passing q as *:* , but we need to find out a better way if there is any. So according to pravesh , i m thinking of to pass user entered text in query and date and other fields in filter query? or as per you q=*:* is fast? I have below fields to search Search Term : User Entered Text Field (passing it in q) Title : User Entered Text Field (passing it in fq) Desc : User Entered Text Field (passing it in fq) Appearing : User Entered Text Field (passing it in fq) Date Range : (passing it in fq) Time Zone : (EST , CST ,MST , PST) (passing it in fq) Category : (multiple choice) (passing it in fq) Market : (multiple choice) (passing it in fq) Affiliate Network : (multiple choice) (passing it in fq) I really appreciate your view. Meghana Jeff Schmidt wrote Hi Erik: When using [e]dismax, does configuring q.alt=*:* and not specifying q affect the performance/caching in any way? As a side note, a while back I configured q.alt=*:*, and the application (via SolrJ) still set q=*:* if no user input was provided (faceting). With both of them set that way, I got zero results. (Solr 3.4.0) Interesting. Thanks, Jeff On Nov 22, 2011, at 7:06 AM, Erik Hatcher wrote: If all you're doing is filtering (browsing by facets perhaps), it's perfectly fine to have q=*:*. MatchAllDocsQuery is fast (and would be cached anyway), so use *:* as appropriate without worries. Erik On Nov 22, 2011, at 07:18 , pravesh wrote: Usually, Use the 'q' parameter to search for the free text values entered by the users (where you might want to parse the query and/or apply boosting/phrase-sloppy, minimum match,tie etc ) Use the 'fq' to limit the searches to certain criterias like location, date-ranges etc. Also, avoid using the q=*:* as it implicitly translates to matchalldocsquery Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3527535.html Sent from the Solr - User mailing list archive at Nabble.com. -- Jeff Schmidt 535 Consulting jas@ http://www.535consulting.com (650) 423-1068 -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-effective-search-with-fq-and-q-params-tp3527217p3529876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: wild card search and lower-casing
Yes, it should be ok, as currently we are on the English side. If that's beneficial for the effort, I could do a field test on 3.4 after you close the jira. Best, Dmitry On Wed, Nov 23, 2011 at 2:52 PM, Erick Erickson erickerick...@gmail.comwrote: Ah, I see what you're doing, go for it. I intend to commit it today, but things happen. About changing the setLowerCaseExpandedTerms(true), yes that'll take care of this issue, although it has some locale-specific assumptions (i.e. string.toLowerCase() uses the default locale). That may not matter in your situation though. Best Erick On Tue, Nov 22, 2011 at 10:46 AM, Dmitry Kan dmitry@gmail.com wrote: Thanks, Erick. I was in fact reading the patch (the one attached as a file to the aforementioned jira) you updated sometime yesterday. I'll watch the issue, but as said the change of a hard-coded boolean to its opposite worked just fine for me. Best, Dmitry On 11/22/11, Erick Erickson erickerick...@gmail.com wrote: No, no, no That's something buried in Lucene, it has nothing to do with the patch! The patch has NOT yet been applied to any released code. You could pull the patch from the JIRA and apply it to trunk locally if you wanted. But there's no patch for 3.x, I'll probably put that up over the holiday. But things have changed a bit (one of the things I'll have to do is create some documentation). You *should* be able to specify just legacyMultiTerm=true in your fieldType if you want to apply the 3.x patch to pre 3.6 code. It would be a good field test if that worked for you. But you can't do any of this until the JIRA (SOLR-2438) is marked Resolution: Fixed. Don't be fooled by Fix Version. Fix Version simply says that those are the earliest versions it *could* go in. Best Erick Best Erick On Tue, Nov 22, 2011 at 6:32 AM, Dmitry Kan dmitry@gmail.com wrote: I guess, I have found your comment, thanks. For our current needs I have just set: setLowercaseExpandedTerms(true); // changed from default false in the SolrQueryParser's constructor and that seem to work so far. In order not to start a separate thread on wildcards. Is it so, that for the trailing wildcard there is a minimum of 2 preceding characters for a search to happen? Dmitry On Mon, Nov 21, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.comwrote: It may be. The tricky bit is that there is a constant governing the behavior of this that restricts it to 3.6 and above. You'll have to change it after applying the patch for this to work for you. Should be trivial, I'll leave a note in the code about this, look for SOLR-2438 in the 3x code line for the place to change. On Mon, Nov 21, 2011 at 2:14 AM, Dmitry Kan dmitry@gmail.com wrote: Thanks Erick. Do you think the patch you are working on will be applicable as well to 3.4? Best, Dmitry On Mon, Nov 21, 2011 at 5:06 AM, Erick Erickson erickerick...@gmail.com wrote: As it happens I'm working on SOLR-2438 which should address this. This patch will provide two things: The ability to define a new analysis chain in your schema.xml, currently called multiterm that will be applied to queries of various sorts, including wildcard, prefix, range. This will be somewhat of an expert thing to make yourself... In the absence of an explicit definition it'll synthesize a multiterm analyzer out of the query analyzer, taking any char fitlers, and lowercaseFilter (if present), and ASCIIFoldingfilter (if present) and putting them in the multiterm analyzer along with a (hardcoded) WhitespaceTokenizer. As of 3.6 and 4.0, this will be the default behavior, although you can explicitly define a field type parameter to specify the current behavior. The reason it is on 3.6 is that I want it to bake for a while before getting into the wild, so I have no intention of trying to get it into the 3.5 release. The patch is up for review now, I'd like another set of eyeballs or two on it before committing. The patch that's up there now is against trunk but I hope to have a 3x patch that I'll apply to the 3x code line after 3.5 RC1 is cut. Best Erick On Fri, Nov 18, 2011 at 12:05 PM, Ahmet Arslan iori...@yahoo.com wrote: You're right: public SolrQueryParser(IndexSchema schema, String defaultField) { ... setLowercaseExpandedTerms(false); ... } Please note that lowercaseExpandedTerms uses String.toLowercase() (uses default Locale) which is a Locale sensitive operation. In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if it is ported to solr. http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html
Re: Integrating Surround Query Parser
After this i tried with solr3.1-src. Please suggest what should i do ? Please use solr-trunk. svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk
Re: Solr Performance/Architecture
On 11/22/2011 11:52 PM, Husain, Yavar wrote: Hi Shawn That was so great of you to explain the architecture in such a detail. I enjoyed reading it multiple times. I have a question here: You mentioned that we can use crc32(DocumentId)% NumServers. Now actually I am using that in my data-config.xml in the sql query itself, something like: For Documents to be indexed on Server 1: select DocumentId,PNum,... from Sample where crc32(DocumentId)%2=0; For Documents to be indexed on Server 2: select DocumentId,PNum,... from Sample where crc32(DocumentId)%2=1; Will that be a right way? Will it not be a slow query? Thanks once again. Those queries look good. Compared to an unqalified SELECT, I'm sure the crc32 will slow it down, but unless your database hardware is not up to the job, Solr will probably be more of a bottleneck than the DB. You can have a generic DIH config and pass the information in with the dataimport: url=jdbc:mysql://${dataimporter.request.dbHost}/${dataimporter.request.dbSchema}?zeroDateTimeBehavior=convertToNull snip SELECT * FROM ${dataimporter.request.dataView} WHERE ( ( did gt; ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} ) ${dataimporter.request.extraWhere} ) AND (crc32(did) % ${dataimporter.request.numShards}) IN (${dataimporter.request.modVal}) This is the URL template that will work with the above DIH config: http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdbHost=DBSERVERdbSchema=DBSCHEMAdataView=DATAVIEWnumShards=NUMSHARDSmodVal=MODVALminDid=MINDIDmaxDid=MAXDIDextraWhere=EXTRAWHERE Under normal circumstances extraWhere is blank. It's there for special-purpose importing. Thanks, Shawn
Re: Huge Performance: Solr distributed search
Hello, Is this log from the frontend SOLR (aggregator) or from a shard? Can you merge, e.g. 3 shards together or is it much effort for your team? In our setup we currently have 16 shards with ~30GB each, but we rarely search in all of them at once. Best, Dmitry On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com wrote: Hi! * Data: - Solr 3.4; - 30 shards ~ 13GB, 27-29M docs each shard. * Machine parameters (Ubuntu 10.04 LTS): user@Solr:~$ uname -a Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux user@Solr:~$ cat /proc/cpuinfo processor : 0 - 3 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5690 @ 3.47GHz stepping: 2 cpu MHz : 3458.000 cache size : 12288 KB fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat bogomips: 6916.00 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: user@Solr:~$ cat /proc/meminfo MemTotal: 16992680 kB MemFree: 110424 kB Buffers:9976 kB Cached: 11588380 kB SwapCached:41952 kB Active: 9860764 kB Inactive:6198668 kB Active(anon):4062144 kB Inactive(anon): 398972 kB Active(file):5798620 kB Inactive(file): 5799696 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 46873592 kB SwapFree: 46810712 kB Dirty:36 kB Writeback: 0 kB AnonPages: 4424756 kB Mapped: 940660 kB Shmem:40 kB Slab: 362344 kB SReclaimable: 350372 kB SUnreclaim:11972 kB KernelStack:2488 kB PageTables:68568 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:55369932 kB Committed_AS:5740556 kB VmallocTotal: 34359738367 kB VmallocUsed: 350532 kB VmallocChunk: 34359384964 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M:17299456 kB - Apache Tomcat 6.0.32: !-- java arguments -- -XX:+DisableExplicitGC -XX:PermSize=512M -XX:MaxPermSize=512M -Xmx12G -Xms3G -XX:NewSize=128M -XX:MaxNewSize=128M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9 -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 -verbose:gc -XX:+PrintGCTimeStamps -Xloggc:/opt/search/tomcat/logs/gc.log Out search schema is: - 5 servers with configuration above; - one tomcat6 application on each server with 6 solr applications. - Full addresses are: 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,..., http://192.168.1.85:8080/solr6 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,..., http://192.168.1.86:8080/solr12 ... 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,..., http://192.168.1.89:8080/solr30 - At another server there is a additional common application with shards paramerter: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,..., 192.168.1.89:8080/solr30/str int name=rows10/int /lst /requestHandler - schema and solrconfig are identical for all shards, for first shard see attach; - on these servers are only search, indexation is on another (optimized to 2 segments shards replicate with ssh/rsync scripts). So now the major problem is huge performance on distributed search. Take look on, for example, these logs: This is on 30 shards: INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000} status=0 QTime=40712 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000} status=0 QTime=36097 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000} status=0 QTime=75756 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(blessing's)rows=2000} status=0 QTime=30342 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(reiterated)rows=2000} status=0 QTime=55690 Sometimes QTime is more than 15. But when we run identical queries on one shard separately, QTime is between 200 and 1500. Does ditributed solr search really slow or our architecture
Re: Collection Distribution vs Replication in Solr
On Oct 27, 2011, at 2:57 PM, Alireza Salimi wrote: Hi guys, If we ignore the features that Replication provides ( http://wiki.apache.org/solr/SolrReplication#Features), which approach is better? Is there any performance problems with Replication? Replications seems quite easier (no special configuration, ssh setting, cron setting), while rsync is a robust protocol. Which one do you recommend? Thanks -- Alireza Salimi Java EE Developer Replication with scripts is basically deprecated I'd say. Java replication is the path forward and what I would use. - Mark Miller lucidimagination.com
Re: need a way so that solr return result for misspelled terms
Hi Erik, Thanks for your reply. i come to know that Lucene provides the fuzzy search by applying tilde(~) symbol at the end of search with like delll~0.8 can we apply such fuzzy logic in solr in any way? Thanks Meghana Erik Hatcher-4 wrote Meghana - There's currently no facility in Solr to return results for suggestions automatically. You'll have to code this into your client to make another request to Solr for the suggestions returned from the first request. Erik On Nov 23, 2011, at 07:58 , meghana wrote: Hi, I have configured spellchecker component in my solr. it works with custom request handler (however its not working with standard request handler , but this is not concern at now) . but its returning suggestions for the matching spells, instead of it we want that we can directly get result for relative spells of misspelled search term. Can we do this. Any help much appreciated. Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530769.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: To push the terms.limit parameter from the master core to all the shard cores.
On Nov 22, 2011, at 1:31 PM, mechravi25 wrote: Can you please suggest the definition of the terms component for the underlying shard cores. If you look at my earlier email, you will see the limit is set in invariants rather than defaults. This makes it so the param cannot be dynamically overridden, so it's what you want to use on your underlying shards. - Mark Miller lucidimagination.com
Re: DIH Strange Problem
On 11/23/2011 5:21 AM, Chantal Ackermann wrote: Hi Yavar, my experience with similar problems was that there was something wrong with the database connection or the database. Chantal It's also possible that your JDBC driver might be trying to buffer the entire result set. There's a link on the wiki specifically for this problem on MS SQL server. Hopefully it's that, but Chantal could be right too. http://wiki.apache.org/solr/DataImportHandlerFaq Here's the URL to the specific paragraph, but it's likely that it won't survive the email trip in a clickable form: http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F Thanks, Shawn
UpdateRequestProcessor - processCommit
TWIMC: I creating a custom UpdateRequestProcessor chain, where I need to commit records to a database once the import process has completed. I'm assuming the processCommit method is called for each UpdateRequestProcessor chain class when the records are being commited to the Lucene index. I'm debugging the processor chain using the debug functionality in the dataimport.jsp page, and I have selected verbose and commit as options. When I import 10 records, the processAddd methods are getting called, but the processCommit methods aren't. Is there something obvious that I'm missing here? I'm using SOLR 1.4 TIA, M. -- This e-mail and any files transmitted with it may be proprietary. Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Apogee Integration.
Re: need a way so that solr return result for misspelled terms
I have configured spellchecker component in my solr. it works with custom request handler (however its not working with standard request handler , but this is not concern at now) . but its returning suggestions for the matching spells, instead of it we want that we can directly get result for relative spells of misspelled search term. You might be interested in this : http://sematext.com/products/dym-researcher/index.html
Re: Integrating Surround Query Parser
is this is the trunk of solr 4.0 , can't i implement in solr 3.1 .? On Wed, Nov 23, 2011 at 7:23 PM, Ahmet Arslan iori...@yahoo.com wrote: After this i tried with solr3.1-src. Please suggest what should i do ? Please use solr-trunk. svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk -- Thanks Regards Rahul Mehta
Re: Collection Distribution vs Replication in Solr
Yeah, and actually later I've found someone mentioned that they had done some benchmarks and found that replication is faster than collection distribution. Thanks On Wed, Nov 23, 2011 at 9:02 AM, Mark Miller markrmil...@gmail.com wrote: On Oct 27, 2011, at 2:57 PM, Alireza Salimi wrote: Hi guys, If we ignore the features that Replication provides ( http://wiki.apache.org/solr/SolrReplication#Features), which approach is better? Is there any performance problems with Replication? Replications seems quite easier (no special configuration, ssh setting, cron setting), while rsync is a robust protocol. Which one do you recommend? Thanks -- Alireza Salimi Java EE Developer Replication with scripts is basically deprecated I'd say. Java replication is the path forward and what I would use. - Mark Miller lucidimagination.com -- Alireza Salimi Java EE Developer
Re: Huge Performance: Solr distributed search
Is this log from the frontend SOLR (aggregator) or from a shard? from aggregator Can you merge, e.g. 3 shards together or is it much effort for your team? Yes, we can merge. We'll try to do this and review how it will works Thanks, Dmitry Any another ideas? On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan dmitry@gmail.com wrote: Hello, Is this log from the frontend SOLR (aggregator) or from a shard? Can you merge, e.g. 3 shards together or is it much effort for your team? In our setup we currently have 16 shards with ~30GB each, but we rarely search in all of them at once. Best, Dmitry On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com wrote: Hi! * Data: - Solr 3.4; - 30 shards ~ 13GB, 27-29M docs each shard. * Machine parameters (Ubuntu 10.04 LTS): user@Solr:~$ uname -a Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux user@Solr:~$ cat /proc/cpuinfo processor : 0 - 3 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5690 @ 3.47GHz stepping : 2 cpu MHz : 3458.000 cache size : 12288 KB fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat bogomips : 6916.00 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: user@Solr:~$ cat /proc/meminfo MemTotal: 16992680 kB MemFree: 110424 kB Buffers: 9976 kB Cached: 11588380 kB SwapCached: 41952 kB Active: 9860764 kB Inactive: 6198668 kB Active(anon): 4062144 kB Inactive(anon): 398972 kB Active(file): 5798620 kB Inactive(file): 5799696 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 46873592 kB SwapFree: 46810712 kB Dirty: 36 kB Writeback: 0 kB AnonPages: 4424756 kB Mapped: 940660 kB Shmem: 40 kB Slab: 362344 kB SReclaimable: 350372 kB SUnreclaim: 11972 kB KernelStack: 2488 kB PageTables: 68568 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 55369932 kB Committed_AS: 5740556 kB VmallocTotal: 34359738367 kB VmallocUsed: 350532 kB VmallocChunk: 34359384964 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M: 17299456 kB - Apache Tomcat 6.0.32: !-- java arguments -- -XX:+DisableExplicitGC -XX:PermSize=512M -XX:MaxPermSize=512M -Xmx12G -Xms3G -XX:NewSize=128M -XX:MaxNewSize=128M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9 -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 -verbose:gc -XX:+PrintGCTimeStamps -Xloggc:/opt/search/tomcat/logs/gc.log Out search schema is: - 5 servers with configuration above; - one tomcat6 application on each server with 6 solr applications. - Full addresses are: 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,..., http://192.168.1.85:8080/solr6 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,..., http://192.168.1.86:8080/solr12 ... 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,..., http://192.168.1.89:8080/solr30 - At another server there is a additional common application with shards paramerter: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,..., 192.168.1.89:8080/solr30/str int name=rows10/int /lst /requestHandler - schema and solrconfig are identical for all shards, for first shard see attach; - on these servers are only search, indexation is on another (optimized to 2 segments shards replicate with ssh/rsync scripts). So now the major problem is huge performance on distributed search. Take look on, for example, these logs: This is on 30 shards: INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(barium)rows=2000} status=0 QTime=40712 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(pittances)rows=2000} status=0 QTime=36097 INFO: [] webapp=/solr path=/select/params={fl=*,scoreident=truestart=0q=(reliability)rows=2000} status=0 QTime=75756 INFO: [] webapp=/solr
Re: need a way so that solr return result for misspelled terms
Sure... if you're using the lucene query parser and put a ~ after every term in the query :) But that would mean that either the users or your application do this. Erik On Nov 23, 2011, at 09:03 , meghana wrote: Hi Erik, Thanks for your reply. i come to know that Lucene provides the fuzzy search by applying tilde(~) symbol at the end of search with like delll~0.8 can we apply such fuzzy logic in solr in any way? Thanks Meghana Erik Hatcher-4 wrote Meghana - There's currently no facility in Solr to return results for suggestions automatically. You'll have to code this into your client to make another request to Solr for the suggestions returned from the first request. Erik On Nov 23, 2011, at 07:58 , meghana wrote: Hi, I have configured spellchecker component in my solr. it works with custom request handler (however its not working with standard request handler , but this is not concern at now) . but its returning suggestions for the matching spells, instead of it we want that we can directly get result for relative spells of misspelled search term. Can we do this. Any help much appreciated. Meghana -- View this message in context: http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530584.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3530769.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocomplete(terms) performance problem
Thanks for your answer Nagendra, The problem is i want to do some infix searches. When i search for sisco i want the autocomplete with san fran*sisco*. In the example you gave me it's also not possible. Roy -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3530891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Huge Performance: Solr distributed search
If the response time from each shard shows decent figures, then aggregator seems to be a bottleneck. Do you btw have a lot of concurrent users? On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote: Is this log from the frontend SOLR (aggregator) or from a shard? from aggregator Can you merge, e.g. 3 shards together or is it much effort for your team? Yes, we can merge. We'll try to do this and review how it will works Thanks, Dmitry Any another ideas? On Wed, Nov 23, 2011 at 4:01 PM, Dmitry Kan dmitry@gmail.com wrote: Hello, Is this log from the frontend SOLR (aggregator) or from a shard? Can you merge, e.g. 3 shards together or is it much effort for your team? In our setup we currently have 16 shards with ~30GB each, but we rarely search in all of them at once. Best, Dmitry On Wed, Nov 23, 2011 at 3:12 PM, Artem Lokotosh arco...@gmail.com wrote: Hi! * Data: - Solr 3.4; - 30 shards ~ 13GB, 27-29M docs each shard. * Machine parameters (Ubuntu 10.04 LTS): user@Solr:~$ uname -a Linux Solr 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux user@Solr:~$ cat /proc/cpuinfo processor : 0 - 3 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5690 @ 3.47GHz stepping: 2 cpu MHz : 3458.000 cache size : 12288 KB fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat bogomips: 6916.00 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: user@Solr:~$ cat /proc/meminfo MemTotal: 16992680 kB MemFree: 110424 kB Buffers:9976 kB Cached: 11588380 kB SwapCached:41952 kB Active: 9860764 kB Inactive:6198668 kB Active(anon):4062144 kB Inactive(anon): 398972 kB Active(file):5798620 kB Inactive(file): 5799696 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 46873592 kB SwapFree: 46810712 kB Dirty:36 kB Writeback: 0 kB AnonPages: 4424756 kB Mapped: 940660 kB Shmem:40 kB Slab: 362344 kB SReclaimable: 350372 kB SUnreclaim:11972 kB KernelStack:2488 kB PageTables:68568 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:55369932 kB Committed_AS:5740556 kB VmallocTotal: 34359738367 kB VmallocUsed: 350532 kB VmallocChunk: 34359384964 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M:17299456 kB - Apache Tomcat 6.0.32: !-- java arguments -- -XX:+DisableExplicitGC -XX:PermSize=512M -XX:MaxPermSize=512M -Xmx12G -Xms3G -XX:NewSize=128M -XX:MaxNewSize=128M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9 -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 -verbose:gc -XX:+PrintGCTimeStamps -Xloggc:/opt/search/tomcat/logs/gc.log Out search schema is: - 5 servers with configuration above; - one tomcat6 application on each server with 6 solr applications. - Full addresses are: 1) http://192.168.1.85:8080/solr1,http://192.168.1.85:8080/solr2,..., http://192.168.1.85:8080/solr6 2) http://192.168.1.86:8080/solr7,http://192.168.1.86:8080/solr8,..., http://192.168.1.86:8080/solr12 ... 5) http://192.168.1.89:8080/solr25,http://192.168.1.89:8080/solr26,..., http://192.168.1.89:8080/solr30 - At another server there is a additional common application with shards paramerter: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str str name=shards192.168.1.85:8080/solr1,192.168.1.85:8080/solr2,..., 192.168.1.89:8080/solr30/str int name=rows10/int /lst /requestHandler - schema and solrconfig are identical for all shards, for first shard see attach; - on these servers are only search, indexation is on another (optimized to 2 segments shards replicate with ssh/rsync scripts). So now the major problem is huge performance on distributed search. Take look on, for example, these logs: This is on 30 shards: INFO: [] webapp=/solr
Re: how to : multicore setup with same config files
Hi, yes, see http://wiki.apache.org/solr/DistributedSearch Regards Vadim 2011/11/2 Val Minyaylo vminya...@centraldesktop.com Have you tried to query multiple cores at same time? On 10/31/2011 8:30 AM, Vadim Kisselmann wrote: it works. it was one wrong placed backslash in my config;) sharing the config/schema files is not a problem. regards vadim 2011/10/31 Vadim Kisselmannv.kisselmann@**googlemail.comv.kisselm...@googlemail.com Hi folks, i have a small blockade in the configuration of an multicore setup. i use the latest solr version (4.0) from trunk and the example (with jetty). single core is running without problems. We assume that i have this structure: /solr-trunk/solr/example/**multicore/ solr.xml core0/ core1/ /solr-data/ /conf/ schema.xml solrconfig.xml /data/ core0/ index core1/ index I want so share the config-files(same instanceDir but different docDir) How can i configure this so that it works(solrconfig.xml, solr.xml)? Do i need the directories for core0/core1 in solr-trunk/...? I found issues in Jira with old patches which unfortunately doesn't work. Thanks and Regards Vadim
Re: Huge Performance: Solr distributed search
If the response time from each shard shows decent figures, then aggregator seems to be a bottleneck. Do you btw have a lot of concurrent users?For now is not a problem, but we expect from 1K to 10K of concurrent users and maybe more On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan dmitry@gmail.com wrote: If the response time from each shard shows decent figures, then aggregator seems to be a bottleneck. Do you btw have a lot of concurrent users? On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote: Is this log from the frontend SOLR (aggregator) or from a shard? from aggregator Can you merge, e.g. 3 shards together or is it much effort for your team? Yes, we can merge. We'll try to do this and review how it will works Thanks, Dmitry Any another ideas? -- Best regards, Artem Lokotosh mailto:arco...@gmail.com
Re: Integrating Surround Query Parser
is this is the trunk of solr 4.0 , can't i implement in solr 3.1 .? Author of the patch would know answer to this. But why not use trunk?
Re: Huge Performance: Solr distributed search
If you request 1000 docs from each shard, then aggregator is really fetching 30,000 total documents, which then it must merge (re-sort results, and take top 1000 to return to client). Its possible that SOLR merging implementation needs optimized, but it does not seem like it could be that slow. How big are the documents you return (how many fields, avg KB per doc, etc.)? I would take a look at network to make sure that is not some bottleneck, and also to make sure there is not some underlying issue making 30 concurrent HTTP requests from the aggregator. I am not an expert in Java, but under .NET there is a setting that limits concurrent out-going HTTP requests from a process that must be over-ridden via configuration, otherwise by default is very limiting. Does performance get much better if you only request top 100, or top 10 documents instead of top 1000? What if you only request a couple fields, instead of fl=*? What if you only search 10 shards instead of 30? I would collect those numbers and try to determine if time increases linearly or not as you increase shards and/or # of docs. On Wed, Nov 23, 2011 at 9:55 AM, Artem Lokotosh arco...@gmail.com wrote: If the response time from each shard shows decent figures, then aggregator seems to be a bottleneck. Do you btw have a lot of concurrent users?For now is not a problem, but we expect from 1K to 10K of concurrent users and maybe more On Wed, Nov 23, 2011 at 4:43 PM, Dmitry Kan dmitry@gmail.com wrote: If the response time from each shard shows decent figures, then aggregator seems to be a bottleneck. Do you btw have a lot of concurrent users? On Wed, Nov 23, 2011 at 4:38 PM, Artem Lokotosh arco...@gmail.com wrote: Is this log from the frontend SOLR (aggregator) or from a shard? from aggregator Can you merge, e.g. 3 shards together or is it much effort for your team? Yes, we can merge. We'll try to do this and review how it will works Thanks, Dmitry Any another ideas? -- Best regards, Artem Lokotosh mailto:arco...@gmail.com
Problem with Solr logging under Jetty
I am having a problem with jdk logging with Solr, using the jetty included with Solr. In jetty.xml, I have the following defined: Call class=java.lang.System name=setProperty Argjava.util.logging.config.file/Arg Argetc/logging.properties/Arg /Call Contents of etc/logging.properties: == # Logging level .level=WARNING # Write to a file handlers = java.util.logging.FileHandler # Write log messages in human readable format: java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter java.util.logging.ConsoleHander.formatter = java.util.logging.SimpleFormatter # Log to the log subdirectory, with log files named solr_log-n.log java.util.logging.FileHandler.pattern = ./log/solr_log-%g.log java.util.logging.FileHandler.append = true java.util.logging.FileHandler.count = 10 java.util.logging.FileHandler.limit = 10485760 == This actually all seems to work perfectly at first. I changed the logging level to INFO in the solr admin, and it still seemed to work. Then at some point it stopped logging to solr_log-0.log and started logging to stderr. My init script for Solr sends that to a file, but there's no log rotation on that file and it is overwritten whenever Solr is restarted. With the same config, OS version, java version, and everything else I can think of, my test server is still working, but all of my production servers aren't. It does seem to be related to changing the log level to INFO in the gui, but making that change doesn't make it fail right away. What information can I provide to help troubleshoot this? Thanks, Shawn
Highlighting too much, indexing not seeing commas?
Solr 3.3.0 I have a field/type indexed as below. For a particular document the content of this field is 'FreeBSD,Perl,Linux,Unix,SQL,MySQL,Exim,Postgresql,Apache,Exim' Using eDismax, mm=1 When I query for... +perl +(apache sql) +(linux unix) Strangely, the highlighting is being returned as... FreeBSD,emPerl,Linux,Unix,SQL,MySQL,Exim,Postgresql,Apache/em,Exim The full call is... /select/?qt=coreq=%2Bperl%20%2B%28apache%20sql%29%20%2B%28linux%20unix%29fl=skillshl=truehl.fl=skillsfq=id:2819615 I've checked the matching in the online analyser which looks fine, so can't understand why the highlighting isn't correct, I would have thought the highlighting would have highlighted in the same way the analyser tool does? Is it an index-time/field type issue, or am I missing something in the request? Thanks in advance... fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=skills type=textgen indexed=true stored=true multiValued=false / -- IntelCompute Web Design Local Online Marketing http://www.intelcompute.com
Re: Architecture and Capacity planning for large Solr index
Whether three shards will give you adequate throughput is not an answerable question. Here's what I suggest. Get a single box of the size you expect your servers to be and index 1/3 of your documents on it. Run stress tests. That's really the only way to be fairly sure your hardware is adequate. As far as SANs are concerned, local storage is almost always better. I'd advise against trying to share the index amongst slaves, SAN or not. And using the SAN for each slave's copy seems unnecessary with storage as cheap as it is, what advantage do you see in this scenario? Best Erick On Mon, Nov 21, 2011 at 3:18 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Thanks Otis ! Please ignore my earlier email which does not have all the information. My business requirements have changed a bit. We now need one year rolling data in Production, with the following details - Number of records - 1.2 million - Solr index size for these records comes to approximately 200 - 220 GB. (includes large attachments) - Approx 250 users who will be searching the applicaiton with a peak of 1 search request every 40 seconds. I am planning to address this using Solr distributed search on a VMWare virtualized environment as follows. 1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves (load balanced) 2. Master configuration for each server is as follows - 4 CPUs - 16 GB RAM - 300 GB disk space 3. Slave configuration for each server is as follows - 4 CPUs - 16 GB RAM - 150 GB disk space 4. I am planning to use SAN instead of local storage to store Solr index. And my questions are as follows: Will 3 shards serve the purpose here ? Is SAN a a good option for storing solr index, given the high index volume ? On Mon, Nov 21, 2011 at 3:05 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Thanks ! My business requirements have changed a bit. We need one year rolling data in Production. The index size for the same comes to approximately 200 - 220 GB. I am planning to address this using Solr distributed search as follows. 1. Whole index to be split up between 3 shards, with 3 masters and 6 slaves (load balanced) 2. Master configuration will be 4 CPU On Tue, Oct 11, 2011 at 2:05 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Rahul, This is unfortunately not enough information for anyone to give you very precise answers, so I'll just give some rough ones: * best disk - SSD :) * CPU - multicore, depends on query complexity, concurrency, etc. * sharded search and failover - start with SolrCloud, there are a couple of pages about it on the Wiki and http://blog.sematext.com/2011/09/14/solr-digest-spring-summer-2011-part-2-solr-cloud-and-near-real-time-search/ Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Rahul Warawdekar rahul.warawde...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Tuesday, October 11, 2011 11:47 AM Subject: Architecture and Capacity planning for large Solr index Hi All, I am working on a Solr search based project, and would highly appreciate help/suggestions from you all regarding Solr architecture and capacity planning. Details of the project are as follows 1. There are 2 databases from which, data needs to be indexed and made searchable, - Production - Archive 2. Production database will retain 6 months old data and archive data every month. 3. Archive database will retain 3 years old data. 4. Database is SQL Server 2008 and Solr version is 3.1 Data to be indexed contains a huge volume of attachments (PDF, Word, excel etc..), approximately 200 GB per month. We are planning to do a full index every month (multithreaded) and incremental indexing on a daily basis. The Solr index size is coming to approximately 25 GB per month. If we were to use distributed search, what would be the best configuration for Production as well as Archive indexes ? What would be the best CPU/RAM/Disk configuration ? How can I implement failover mechanism for sharded searches ? Please let me know in case I need to share more information. -- Thanks and Regards Rahul A. Warawdekar -- Thanks and Regards Rahul A. Warawdekar -- Thanks and Regards Rahul A. Warawdekar
Re: Problem with pdf files indexing
The first thing I'd do is go over to the server and try using the admin interface to query on *:*. If that returns nothing, look at the admin/schema browser page and see what's in your fields, if anything. Then go back to SolrJ and work on the query part sans the indexing part once you're sure you have data to work with. Also, do you Solr logs show anything? Best Erick On Tue, Nov 22, 2011 at 4:13 AM, Dali medalibenmans...@gmail.com wrote: Hi !I'm using solr 3.3 version and i have some pdf files which i want to index. I followed instructions from the wiki page: http://wiki.apache.org/solr/ExtractingRequestHandler The problem is that i can add my documents to Solr but i cannot request them. Here is what i have: *solrconfig.xml*: requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixignored_/str str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler *schema.xml *: field name=title type=string indexed=true stored=true/ field name=author type=string indexed=true stored=true / field name=text type=text_general indexed=true stored=true multiValued=true/ *data-config.xml* : ... dataSource type=BinFileDataSource name=ds-file/ ... entity processor=TikaEntityProcessor dataSource=ds-file url=../${document.filename} field column=Author name=author meta=true/ field column=title name=title meta=true/ field column=text name=text/ /entity ... I use Solrj to add documents as follows: SolrServer server = new CommonsHttpSolrServer(http://localhost:8080/solr;); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(d:\\test.pdf)); up.setParam(literal.id, test); up.setParam(extractOnly, true); server.commit(); NamedList result = server.request(up); System.out.println(Result: + result); // can display information about test.pdf QueryResponse rsp = server.query( new SolrQuery( *:*) ); System.out.println(rsp: + rsp); // returns nothing Any suggestion? -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-pdf-files-indexing-tp3527202p3527202.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re : AW: How to select all docs of 'today' ?
One subtlety to note is that caching is messed up by this form since NOW evaluates to the second, and submitting two successive queries exactly like this won't re-use the cache. On a query like this it may not matter unless you're paging But on filter queries, its a good habit to cultivate to write something like [NOW/DAY TO NOW/DAY+1DAY] which will be reused until midnight tonight... Best Erick On Tue, Nov 22, 2011 at 12:02 PM, Danicela nutch danicela-nu...@mail.com wrote: Thanks it works. All this is based on the fact that NOW/DAY means the beginning of the day. - Message d'origine - De : sebastian.pet...@tib.uni-hannover.de Envoyés : 22.11.11 16:46 À : solr-user@lucene.apache.org Objet : AW: How to select all docs of 'today' ? Hi, fetch-time:[NOW/DAY TO NOW] should do it. Best Sebastian -Ursprüngliche Nachricht- Von: Danicela nutch [mailto:danicela-nu...@mail.com] Gesendet: Dienstag, 22. November 2011 16:08 An: solr-user@lucene.apache.org Betreff: How to select all docs of 'today' ? Hi, I have a fetch-time (date) field to know when the documents were fetched. I want to make a query to get all documents fetched today. I tried : fetch-time:NOW/DAY but it returns always 0. fetch-time:[NOW/DAY TO NOW/DAY] (it returns 0) fetch-time:[NOW/DAY-1DAY TO NOW/DAY] but it returns documents fetched yesterday. fetch-time:[NOW/DAY-1HOUR TO NOW/DAY] but it's incorrect too. Do you have any idea ? Thanks in advance.
Re: Problems with AutoSuggest feature(Terms Components)
I'll have to defer that to one of the sharding experts. Best Erick On Tue, Nov 22, 2011 at 1:28 PM, mechravi25 mechrav...@yahoo.co.in wrote: Hi Erick, Thanks for your reply. I would know all the options that can be given under the defaults section and how they can be overridden. is there any documentation available in solr forum. Cos we tried searching and wasn't able to succeed. My Exact scenario is that, I have one master core which has many underlying shards core(Disturbed architecture). I want the terms.limit should be defaulted to 10 in the underlying shards cores. When i hit the master core, it will in-turn hit the underlying shard cores. At this point of time, the terms.limit which has been passed to the master core has to passed to these underlying shard cores overriding the default value set. Can you please suggest the definition of the terms component for the underlying shard cores. Regards, Sivaganesh -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-with-AutoSuggest-feature-Terms-Components-tp3512734p3528597.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FunctionQuery score=0
: Which answers my query needs. BUT, my boost function actually changes some : of the results to be of score 0, which I want to be excluded from the : result set. Ok .. so the crux of the issue is that your boost function results in a value of 0 for some documents, and you would like those documents excluded from your results... eqsim(alltokens,xyz) eqsim is not a function thta ships with Solr (as far as i know) so i'm guessing it's something custom .. can you clarify what it does? : 2) This is why I used the frange query to solve the issue with the score 0: : q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 title^0.08 : categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}) : : But this time, the remaining results lost their *boosted* scores, and : therefore the sort by score got all mixed up. correct: frange produces a ConstantScoreQuery, it can only be used to filter documents based on wether the function it wraps falls in/out of the range. : 3) I assume I can use filter queries, but from my understanding FQs : actually perform another query before the main one and these queries are : expensive in time and I would like to avoid it if possible. Unless you actaully see notisable performance problems I wouldn't assume it will be an issue -- test first, get it working, then optimize if it's too slow. For most people the overhead of the fq won't a factor. One option you might consider is the cache=false local param which tells Solr not to cache the fq (handy if you know the query you are filtering on is not going to be reused much) and since it's not being cached, Solr will execute it in parallel with the main query and ignore anything that it already knows isn't going to matter in the final query. In your case however, you can already optimize the fq solution a bit because what you really need to filter out isn't documents matching your main query with a score less then zero; that set is the same as the set of documents for whom your eqsim function returns 0, so you can just use *that* in your fq. Something like this should work... q={!edismax ... boost=$eqsim} fq={!frange l=0 incl=false v=$eqsim} eqsim=eqsim(alltokens,xyz) ...but there may still be ways to clean that up and make it faster depending on what exactly your eqsim function does (ie: there may be a simple query that can be faster then that frange to identify the docs that get non-zero values from that function. -Hoss
Re: Solr dismax scoring and weight
Thanks a lot Erick for this explanation. Do you mean words are stored in bytes, that's it ? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-dismax-scoring-and-weight-tp3490096p3531917.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: strange behavior of scores and term proximity use
I tested with the version 4.0-2011-11-04_09-29-42. Ariel 2011/11/17 Erick Erickson erickerick...@gmail.com Hmmm, I'm not seeing similar behavior on a trunk from today, when did you get your copy? Erick On Wed, Nov 16, 2011 at 2:06 PM, Ariel Zerbib ariel.zer...@gmail.com wrote: Hi, For this term proximity query: ab_main_title_l0:to be or not to be~1000 http://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22~1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=truehttp://localhost:/solr/select?q=ab_main_title_l0%3A%22og54ct8n+to+be+or+not+to+be+5w8ojsx2%22%7E1000sort=score+descstart=0rows=3fl=ab_main_title_l0%2Cscore%2CiddebugQuery=true The third first results are the following one: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime5/int /lst result name=response numFound=318 start=0 maxScore=3.0814114 doc long name=id2315190010001021/long arr name=ab_main_title_l0 strog54ct8n To be or not to be a Jew. 5w8ojsx2/str /arr float name=score3.0814114/float/doc doc long name=id2313006480001021/long arr name=ab_main_title_l0 strog54ct8n To be or not to be 5w8ojsx2/str /arr float name=score3.0814114/float/doc doc long name=id2356410250001021/long arr name=ab_main_title_l0 strog54ct8n Rumspringa : to be or not to be Amish / 5w8ojsx2/str /arr float name=score3.0814114/float/doc /result lst name=debug str name=rawquerystringab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000/str str name=querystringab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000/str str name=parsedqueryPhraseQuery(ab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000)/str str name=parsedquery_toStringab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000/str lst name=explain str name=2315190010001021 5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000 in 378403) [DefaultSimilarity], result of: 5.337161 = fieldWeight in 378403, product of: 0.57735026 = tf(freq=0.3334), with freq of: 0.3334 = phraseFreq=0.3334 29.581549 = idf(), sum of: 1.0012436 = idf(docFreq=3297332, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 4.3826413 = idf(docFreq=112108, maxDocs=3301436) 6.3982043 = idf(docFreq=14937, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 1.0017256 = idf(docFreq=3295743, maxDocs=3301436) 0.3125 = fieldNorm(doc=378403) /str str name=2313006480001021 9.244234 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000 in 482807) [DefaultSimilarity], result of: 9.244234 = fieldWeight in 482807, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = phraseFreq=1.0 29.581549 = idf(), sum of: 1.0012436 = idf(docFreq=3297332, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 4.3826413 = idf(docFreq=112108, maxDocs=3301436) 6.3982043 = idf(docFreq=14937, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 1.0017256 = idf(docFreq=3295743, maxDocs=3301436) 0.3125 = fieldNorm(doc=482807) /str str name=2356410250001021 5.337161 = (MATCH) weight(ab_main_title_l0:og54ct8n to be or not to be 5w8ojsx2~1000 in 1317563) [DefaultSimilarity], result of: 5.337161 = fieldWeight in 1317563, product of: 0.57735026 = tf(freq=0.3334), with freq of: 0.3334 = phraseFreq=0.3334 29.581549 = idf(), sum of: 1.0012436 = idf(docFreq=3297332, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 4.3826413 = idf(docFreq=112108, maxDocs=3301436) 6.3982043 = idf(docFreq=14937, maxDocs=3301436) 3.0405464 = idf(docFreq=429046, maxDocs=3301436) 5.3583193 = idf(docFreq=42257, maxDocs=3301436) 1.0017256 = idf(docFreq=3295743, maxDocs=3301436) 0.3125 = fieldNorm(doc=1317563) /str /response The used version is a 4.0 October snapshot. I have 2 questions about the result: - Why debug print and scores in result are different? - What is the expected behavior of this kind of term proximity query? - The debug scores seem to be well ordered but the result scores seem to be wrong. Thanks, Ariel
Re: Separate ACL and document index
I have used two different ways: 1) Store mapping from users to documents in some external database such as MySQL. At search time, lookup mapping for user to some unique doc ID or some group ID, and then build query or doc set which you can cache in SOLR process for some period. Then use that as a filter in your search. This is more involved approach but better if you have lots of ACLs per user, but it is non-trivial to implement it well. I used this in a system with over 100 million docs, and approx. 20,000 ACLs per user. The ACL mapped user to a set of group IDs, and each group could have 10,000+ documents. 2) Generate a query filter that you pass to SOLR as part of the search. Potentially it could be a pretty large query if user has granular ACL over may documents or groups. I've seen it work ok with up to 1000 or so ACLs per user query. So you build that filter query from the client using some external database to lookup user ACLs before sending request to SOLR. Bob On Tue, Nov 22, 2011 at 10:48 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, Is it possible to separate ACL index and document index and achieve to search by user role in SOLR? Currently my implementation is to index ACL with document, but the document itself change frequently. I have to perform rebuild index every time when ACL change. It's heavy for whole system due to document are so many and content are huge. Do you guys have any solution to solve this problem. I've been read mailing list for a while. Seem there is not suitable solution for me. I want user searches result only for him according to his role but I don't want to re-index document every time when document's ACL change. To my knowledge, is this possible to perform a join like database to achieve this? How and possible? Thanks Floyd
trouble with CollationKeyFilter
I'm using CollectionKeyFilter to sort my documents using the Unicode root collation, and my documents do appear to be getting sorted correctly, but I'm getting weird results when performing range filtering using the sort key field. For example: ifp_sortkey_ls:[youth culture TO youth culture] and ifp_sortkey_ls:{youth culture TO youth culture} both return 0 hits but ifp_sortkey_ls:youth culture returns 1 hit It seems as if any query using the ifp_sortkey_ls:[A to B] syntax is acting as if the terms A, B are greater than all documents whose sortkeys start with an A-Z character, but less than a few documents that have greek letters as their first characters of their sortkeys. the analysis chain for ifp_sortkey_ls is: fieldType name=sortkey stored=false indexed=true class=solr.TextField positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- The TrimFilter removes any leading or trailing whitespace -- filter class=solr.TrimFilterFactory / filter class=solr.CollationKeyFilterFactory language= strength=primary / /analyzer /fieldType Does anyone have any idea what might be going on here?
Re: FunctionQuery score=0
Thanks Hoss, I will give those a try and let you know. Cheers. On Wed, Nov 23, 2011 at 8:35 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Which answers my query needs. BUT, my boost function actually changes some : of the results to be of score 0, which I want to be excluded from the : result set. Ok .. so the crux of the issue is that your boost function results in a value of 0 for some documents, and you would like those documents excluded from your results... eqsim(alltokens,xyz) eqsim is not a function thta ships with Solr (as far as i know) so i'm guessing it's something custom .. can you clarify what it does? : 2) This is why I used the frange query to solve the issue with the score 0: : q={!frange l=0 incl=false}query({!type=edismax qf=abstract^0.02 title^0.08 : categorysearch^0.05 boost='eqsim(alltokens,xyz)' v='+tokens5:xyz '}) : : But this time, the remaining results lost their *boosted* scores, and : therefore the sort by score got all mixed up. correct: frange produces a ConstantScoreQuery, it can only be used to filter documents based on wether the function it wraps falls in/out of the range. : 3) I assume I can use filter queries, but from my understanding FQs : actually perform another query before the main one and these queries are : expensive in time and I would like to avoid it if possible. Unless you actaully see notisable performance problems I wouldn't assume it will be an issue -- test first, get it working, then optimize if it's too slow. For most people the overhead of the fq won't a factor. One option you might consider is the cache=false local param which tells Solr not to cache the fq (handy if you know the query you are filtering on is not going to be reused much) and since it's not being cached, Solr will execute it in parallel with the main query and ignore anything that it already knows isn't going to matter in the final query. In your case however, you can already optimize the fq solution a bit because what you really need to filter out isn't documents matching your main query with a score less then zero; that set is the same as the set of documents for whom your eqsim function returns 0, so you can just use *that* in your fq. Something like this should work... q={!edismax ... boost=$eqsim} fq={!frange l=0 incl=false v=$eqsim} eqsim=eqsim(alltokens,xyz) ...but there may still be ways to clean that up and make it faster depending on what exactly your eqsim function does (ie: there may be a simple query that can be faster then that frange to identify the docs that get non-zero values from that function. -Hoss
WordDelimiterFilter MultiPhraseQuery case insesitive Issue
Hi, case insesitive search is not working if I use WordDelimiterFilter splitOnCaseChange=1 I am searching for word norton and here is result norton: returns result Norton: returns result but nOrton: no results I want nOrton should results. Please help. below is my field type. fieldType autoGeneratePhraseQueries=true class=solr.TextField name=text positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory enablePositionIncrements=true ignoreCase=true words=stopwords.txt / filter catenateAll=0 catenateNumbers=1 catenateWords=1 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt / filter class=solr.PorterStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory enablePositionIncrements=true ignoreCase=true words=stopwords.txt / filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1 splitOnCaseChange=1 / filter class=solr.SynonymFilterFactory expand=true ignoreCase=true synonyms=synonyms.txt / filter class=solr.LowerCaseFilterFactory / filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt / filter class=solr.PorterStemFilterFactory / /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiterFilter-MultiPhraseQuery-case-insesitive-Issue-tp3532209p3532209.html Sent from the Solr - User mailing list archive at Nabble.com.
Synonyms 1 fetching 2001, how to avoid
Hi, I am searching on movie titles. with synonyms text file mapped to 1,one. With this, when I am searching for '1' I am expecting '1 in kind' but I am getting results which have titles like 2001: My year . I am using query time analyser with filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / I am going to try with expand=false. But anything else I need to look at? -- View this message in context: http://lucene.472066.n3.nabble.com/Synonyms-1-fetching-2001-how-to-avoid-tp3532398p3532398.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: WordDelimiterFilter MultiPhraseQuery case insesitive Issue
On 11/23/2011 2:54 PM, Uomesh wrote: Hi, case insesitive search is not working if I use WordDelimiterFilter splitOnCaseChange=1 I am searching for word norton and here is result norton: returns result Norton: returns result but nOrton: no results I want nOrton should results. Please help. below is my field type. Try adding preserveOriginal=1 to your WDF options. You may not need to actually reindex before you see results, but it would be a good idea to reindex. This will result in an increase in your index size. Thanks, Shawn
Re: Dismax, pf and qf
: Now there are some scenario when I want just the pf active (without : qf). Othen then surrounding my query with double quotes, is there : another way to do that? I mean, i would like to do the following : : _query:{!dismax pf=author^100}vincent kwner ...nope ... the pf is just a boosting factor to improve scores, there's no way to force a match in the pf fields. wrapping the input in quotes and using qf is the only way I know of to get what you are describing. -Hoss
Re: trouble with CollationKeyFilter
hi, locale sensitive range queries don't work with these filters, only sort, although erick erickson has a patch that will enable this (the lowercasing wildcards patch, then you could add this filter to your multiterm chain). separately locale range queries and sort both work easily on trunk (with binary terms)... just use collationfield or icucollationfield if you are able to use trunk... otherwise for 3.x I think that patch is pretty close any day now, so we can add an example for localized range queries that makes use of it. On Nov 23, 2011 4:39 PM, Michael Sokolov soko...@ifactory.com wrote: I'm using CollectionKeyFilter to sort my documents using the Unicode root collation, and my documents do appear to be getting sorted correctly, but I'm getting weird results when performing range filtering using the sort key field. For example: ifp_sortkey_ls:[youth culture TO youth culture] and ifp_sortkey_ls:{youth culture TO youth culture} both return 0 hits but ifp_sortkey_ls:youth culture returns 1 hit It seems as if any query using the ifp_sortkey_ls:[A to B] syntax is acting as if the terms A, B are greater than all documents whose sortkeys start with an A-Z character, but less than a few documents that have greek letters as their first characters of their sortkeys. the analysis chain for ifp_sortkey_ls is: fieldType name=sortkey stored=false indexed=true class=solr.TextField positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- The TrimFilter removes any leading or trailing whitespace -- filter class=solr.TrimFilterFactory / filter class=solr.CollationKeyFilterFactory language= strength=primary / /analyzer /fieldType Does anyone have any idea what might be going on here?
Re: Autocomplete(terms) performance problem
I have now enabled the infix search. So you will be able to do both edge as well as infix search. Type francisco peak in the edge field, and in the below infix input field, try, cisco peak, both will get you to the same selections. Please give it a try now: http://solr-ra.tgels.org/solr-ra-autocomplete.jsp -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-terms-performance-problem-tp3351352p3532656.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Separate ACL and document index
Thank you for your sharing, My current solution is similar to 2). But my problem is ACL is early-binding (which means I build index and embedded ACL with document index) I don't want to rebuild full index(a lucene/solr Document with PDF content and ACL) when front end change only permission settings. Seems solution 2) have same problem. Floyd 2011/11/24 Robert Stewart bstewart...@gmail.com: I have used two different ways: 1) Store mapping from users to documents in some external database such as MySQL. At search time, lookup mapping for user to some unique doc ID or some group ID, and then build query or doc set which you can cache in SOLR process for some period. Then use that as a filter in your search. This is more involved approach but better if you have lots of ACLs per user, but it is non-trivial to implement it well. I used this in a system with over 100 million docs, and approx. 20,000 ACLs per user. The ACL mapped user to a set of group IDs, and each group could have 10,000+ documents. 2) Generate a query filter that you pass to SOLR as part of the search. Potentially it could be a pretty large query if user has granular ACL over may documents or groups. I've seen it work ok with up to 1000 or so ACLs per user query. So you build that filter query from the client using some external database to lookup user ACLs before sending request to SOLR. Bob On Tue, Nov 22, 2011 at 10:48 PM, Floyd Wu floyd...@gmail.com wrote: Hi there, Is it possible to separate ACL index and document index and achieve to search by user role in SOLR? Currently my implementation is to index ACL with document, but the document itself change frequently. I have to perform rebuild index every time when ACL change. It's heavy for whole system due to document are so many and content are huge. Do you guys have any solution to solve this problem. I've been read mailing list for a while. Seem there is not suitable solution for me. I want user searches result only for him according to his role but I don't want to re-index document every time when document's ACL change. To my knowledge, is this possible to perform a join like database to achieve this? How and possible? Thanks Floyd
Re: Solr real time update
Thanks for the information. I will play with it. Spark 2011/11/23 Nagendra Nagarajayya nnagaraja...@transaxtions.com Spark: Solr with RankingAlgorithm is not a plugin but a change of search library from Lucene to RankingAlgorithm. Here is more info on the changes you will need to make to your solrconfig.xml: http://solr-ra.tgels.org/wiki/**en/Near_Real_Time_Searchhttp://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search Regards, - Nagendra Nagrajayya http://solr-ra.tgels.org/ http://rankingalgorithm.tgels.**org/ http://rankingalgorithm.tgels.org/ On 11/22/2011 5:40 PM, yu shen wrote: Hi Nagarajayya, Thanks for your information. Do I need to change any configuration of my current solr server to integrate your plugin? Spark 2011/11/22 Nagendra Nagarajayyannagarajayya@**transaxtions.comnnagaraja...@transaxtions.com Yu: To get Near Real Time update in Solr 1.4.1 you will need to use Solr 1.4.1 with RankingAlgorithm. This allows you to update documents in near real time. You can download and give this a try from here: http://solr-ra.tgels.org/ Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org/ http://rankingalgorithm.tgels.org/http://** rankingalgorithm.tgels.org/ http://rankingalgorithm.tgels.org/ On 11/21/2011 9:47 PM, yu shen wrote: Hi All, After some study, I used below snippet. Seems the documents is updated, while still takes a long time. Feels like the parameter does not take effect. Any comments? UpdateRequest req = new UpdateRequest(); req.add(solrDocs); req.setCommitWithin(5000); req.setParam(commitWithin, 5000); req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); req.process(SOLR_SERVER); 2011/11/22 yu shenshenyu...@gmail.com Hi All, I try to do a 'nearly real time update' to solr. My solr version is 1.4.1. I read this solr CommentWithinhttp://wiki.** apache.org/solr/CommitWithinh**ttp://wiki.apache.org/solr/** CommitWithin http://wiki.apache.org/solr/CommitWithin **wiki, and a related threadhttp://lucene.472066.n3.nabble.com/Solr-real-time-http://n3.nabble.com/Solr-real-time-** update-taking-time-td3472709.htmlhttp://lucene.472066.n3.** nabble.com/Solr-real-time-**update-taking-time-td3472709.**htmlhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.html mostly on the difficulty to do this. My issue is I tried the code snippet in the wiki: UpdateRequest req = new UpdateRequest(); req.add(mySolrInputDocument); req.setCommitWithin(1); req.process(server); But my index did not get updated, unless I call SOLR_SERVER.commit(); explicitly. The latter call will take more than 1 minute on average to return. Can I do a real time update on solr 1.4.1? Would someone help to show a workable code snippet? Spark
Re: trouble with CollationKeyFilter
Thanks for confirming that, and laying out the options, Robert. -Mike On 11/23/2011 9:03 PM, Robert Muir wrote: hi, locale sensitive range queries don't work with these filters, only sort, although erick erickson has a patch that will enable this (the lowercasing wildcards patch, then you could add this filter to your multiterm chain). separately locale range queries and sort both work easily on trunk (with binary terms)... just use collationfield or icucollationfield if you are able to use trunk... otherwise for 3.x I think that patch is pretty close any day now, so we can add an example for localized range queries that makes use of it. On Nov 23, 2011 4:39 PM, Michael Sokolovsoko...@ifactory.com wrote: I'm using CollectionKeyFilter to sort my documents using the Unicode root collation, and my documents do appear to be getting sorted correctly, but I'm getting weird results when performing range filtering using the sort key field. For example: ifp_sortkey_ls:[youth culture TO youth culture] and ifp_sortkey_ls:{youth culture TO youth culture} both return 0 hits but ifp_sortkey_ls:youth culture returns 1 hit It seems as if any query using the ifp_sortkey_ls:[A to B] syntax is acting as if the terms A, B are greater than all documents whose sortkeys start with an A-Z character, but less than a few documents that have greek letters as their first characters of their sortkeys. the analysis chain for ifp_sortkey_ls is: fieldType name=sortkey stored=false indexed=true class=solr.TextField positionIncrementGap=100 omitNorms=true omitTermFreqAndPositions=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- The TrimFilter removes any leading or trailing whitespace -- filter class=solr.TrimFilterFactory / filter class=solr.CollationKeyFilterFactory language= strength=primary / /analyzer /fieldType Does anyone have any idea what might be going on here?
Re: need a way so that solr return result for misspelled terms
We are using solr query parser... just need some schema and / or solrconfig configuration to do the misspell search and find results. -- View this message in context: http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3532979.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need a way so that solr return result for misspelled terms
this seems to be good. if it is possible , i want to make it from solr features / configuration changes. but can go for it, if it is not possible or not much compatible. Thanks. iorixxx wrote I have configured spellchecker component in my solr. it works with custom request handler (however its not working with standard request handler , but this is not concern at now) . but its returning suggestions for the matching spells, instead of it we want that we can directly get result for relative spells of misspelled search term. You might be interested in this : http://sematext.com/products/dym-researcher/index.html -- View this message in context: http://lucene.472066.n3.nabble.com/need-a-way-so-that-solr-return-result-for-misspelled-terms-tp3530584p3532983.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DIH Strange Problem
Hi Thanks for your replies. I carried out these 2 steps (it did not solve my problem): 1. I tried setting responseBuffering to adaptive. Did not work. 2. For checking Database connection I wrote a simple java program to connect to database and fetch some results with the same driver that I use for solr. It worked. So it does not seem to be a problem with the connection. Now I am stuck where Tomcat log says: Creating a connection for entity . and does nothing, I mean after this log we usually get the getConnection() took x millisecond however I dont get that ,I can just see the time moving with no records getting fetched. Original Problem listed again: I am using Solr 1.4.1 on Windows/MS SQL Server and am using DIH for importing data. Indexing and all was working perfectly fine. However today when I started full indexing again, Solr halts/stucks at the line Creating a connection for entity. There are no further messages after that. I can see that DIH is busy and on the DIH console I can see A command is still running, I can also see total rows fetched = 0 and total request made to datasource = 1 and time is increasing however it is not doing anything. This is the exact configuration that worked for me. I am not really able to understand the problem here. Also in the index directory where I am storing the index there are just 3 files: 2 segment files + 1 lucene*-write.lock file. ... data-config.xml: dataSource type=JdbcDataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders user=testUser password=password/ document . . Logs: INFO: Server startup in 2016 ms Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=11 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Nov 23, 2011 4:11:27 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\solrindexes\index,segFN=segments_6,version=1322041133719,generation=6,filenames=[segments_6] Nov 23, 2011 4:11:27 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1322041133719 Nov 23, 2011 4:11:27 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity SampleText with URL: jdbc:sqlserver://127.0.0.1:1433;databaseName=SampleOrders -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Wednesday, November 23, 2011 7:36 PM To: solr-user@lucene.apache.org Subject: Re: DIH Strange Problem On 11/23/2011 5:21 AM, Chantal Ackermann wrote: Hi Yavar, my experience with similar problems was that there was something wrong with the database connection or the database. Chantal It's also possible that your JDBC driver might be trying to buffer the entire result set. There's a link on the wiki specifically for this problem on MS SQL server. Hopefully it's that, but Chantal could be right too. http://wiki.apache.org/solr/DataImportHandlerFaq Here's the URL to the specific paragraph, but it's likely that it won't survive the email trip in a clickable form: http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F Thanks, Shawn ** This message may contain confidential or proprietary information intended only for the use of the addressee(s) named above or may contain information that is legally privileged. If you are not the intended addressee, or the person responsible for delivering it to the intended addressee, you are hereby notified that reading, disseminating, distributing or copying this message is strictly prohibited. If you have received this message by mistake, please immediately notify us by replying to the message and delete the original message and any copies immediately thereafter. Thank you.- ** FAFLD