Replication failed without an error =(
hello.. anyone a idea how i can figure out why my replication failed ? i got no errors =( my configuratio is. 2 server! both are master and slave at the same time. only one server makes updates and is so the master. on slave is started via cron a replication. is one server crashed, i can easy switch master to slave, this is because both are master AND slave at the same time. this works well but now no replicate is working since i deleted the pollInterval !?!? is this a reason? thx - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Replication-failed-without-an-error-tp3934655p3934655.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Replication failed without an error =(
bevore this problem i got this problem https://issues.apache.org/jira/browse/SOLR-1781 - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Replication-failed-without-an-error-tp3934655p3934813.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Group by distance
Use group=true and group.field in your query. And your solr version should be solr 3.4 and above. Thanks, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Group-by-distance-tp3934876p3934886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Group by distance
I think this only can works when I have many records in same position. My problem is to group witch short distance... like I say in last mail... about 10km. I need put markers on Poland country and display this. Now I have 100k records, but in future I will have about 2mln records so I must send grouped records. Best, Piotr On 24 April 2012 12:08, ravicv ravichandra...@gmail.com wrote: Use group=true and group.field in your query. And your solr version should be solr 3.4 and above. Thanks, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Group-by-distance-tp3934876p3934886.html Sent from the Solr - User mailing list archive at Nabble.com. -- Piotr (ViruS) Sikora E-mail/JID: vi...@hostv.pl http://piotrsikora.pl
Auto suggest on indexed file content filtered based on user
I am trying to implement an auto-suggest feature. The search feature already exists and searches on file content in user's allotted workspace. The following is from my schema that will be used for search indexing: field name=Text type=text indexed=true stored=false multiValued=false/ field name=UserName type=string indexed=true stored=true multiValued=true/ The search result is filtered by the user name. The suggest is implemented as a searchComponent and the field 'Text' is used by the suggester and would have to be filtered the same way the search is done. The problem with this approach is, suggest works on a single field and there is no way to include the UserName field as a filter. What's the best way out from here? Thanks in advance! Jay -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3934565.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deciding whether to stem at query time
Ah, this is a really good point. Still seems like it has the downsides of #2, though, much bigger space requirements and possibly some time lost on queries. On Mon, Apr 23, 2012 at 3:35 PM, Walter Underwood wun...@wunderwood.orgwrote: There is a third approach. Create two fields and always query both of them, with the exact field given a higher weight. This works great and performs well. It is what we did at Netflix and what I'm doing at Chegg. wunder On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote: So I just realized the other day that stemming basically happens at index time. If I'm understanding correctly, there's no way to allow a user to specify, at run time, whether to stem particular words or not based on a single index. I think there are two options, but I'd love to hear that I'm wrong: 1.) Incrementally build up a white list of words that don't stem very well. To pick a random example out of the blue, light isn't super closely related to, lighter, so I might choose not to stem that. If I wanted to do this, I think (if I understand correctly), stemmerOverrideFilter would help me out with this. I'm not a big fan of this approach. 2.) Index all the text in two fields, once with stemming and once without. Then build some kind of option into the UI for specifying whether to stem the words or not, and search the appropriate field. Unfortunately, this would roughly double the size of my index, and probably affect query times too. Plus, the UI would probably suck. Am I missing an option? Has anyone tried one of these approaches? Thanks! Andrew
Searching on fields with White Spaces
I have a custom fieldtype with the below config fieldType name=text_ngram class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=10 / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=10 / /analyzer /fieldType I have an Autocomplete configured on the same field which gives me result as expected. A new use case is to search kualalumpur or say newyork with out spaces returning Kuala Lumpur and New York which happen to be the original values. What should be the recommended solution. Regards, Shubham
Re: Multi-words synonyms matching
usage of q and fq q = is typically the main query for the search request fq = is Filter Query; generally used to restrict the super set of documents without influencing score (more info. http://wiki.apache.org/solr/CommonQueryParameters#q) For example: q=hotel de ville === returns 100 documents q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === returns 40 documents from super set of 100 documents hope this helps! - Jeevanandam On 24-04-2012 3:08 pm, elisabeth benoit wrote: Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.KeywordTokenizerFactory/ (as I read in one post) and I can see now in the analyzer that mairie is mapped to hotel de ville, but now when I have query hotel de ville, it doesn't match at all with mairie. Anyone has a clue of what I'm doing wrong? I'm using Solr 3.4. Thanks, Elisabeth
Re: Auto suggest on indexed file content filtered based on user
can you please share a sample query? -Jeevanandam On 24-04-2012 1:49 pm, prakash_ajp wrote: I am trying to implement an auto-suggest feature. The search feature already exists and searches on file content in user's allotted workspace. The following is from my schema that will be used for search indexing: field name=Text type=text indexed=true stored=false multiValued=false/ field name=UserName type=string indexed=true stored=true multiValued=true/ The search result is filtered by the user name. The suggest is implemented as a searchComponent and the field 'Text' is used by the suggester and would have to be filtered the same way the search is done. The problem with this approach is, suggest works on a single field and there is no way to include the UserName field as a filter. What's the best way out from here? Thanks in advance! Jay -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3934565.html Sent from the Solr - User mailing list archive at Nabble.com.
Recovery - too many updates received since start
Hi I experience that a Solr looses its connection with Zookeeper and re-establish it. After Solr is reconnection to Zookeeper it begins to recover. It has been missing the connection approximately 10 seconds and meanwhile the leader slice has received some documents (maybe about 1000 documents). Solr fails to update peer sync with the log message: Apr 21, 2012 10:13:40 AM org.apache.solr.update.PeerSync sync WARNING: PeerSync: core=mycollection_slice21_shard1 url=zk-1:2181,zk-2:2181,zk-3:2181 too many updates received since start - startingUpdates no longer overlaps with our currentUpdates Looking into PeerSync and UpdateLog I can see that 100 updates is the maximum allowed updates that a shard can be behind. Is it correct that this is not configurable and what is the reasons for choosing 100? I suspect that one must compare the work needed to replicate the full index with the performance loss/resource usage when enhancing the size of the UpdateLog? Any comments regarding this is greatly appreciated. Best regards Trym
JDBC import yields no data
I'm trying to migrate from RDBMS to the Lucene ecosystem. To do this, I'm trying to use the JDBC importer[1]. My configuration is given below: dataConfig dataSource driver=net.sf.log4jdbc.DriverSpy user=sa url=jdbc:log4jdbc:h2:tcp://192.168.1.6/finance/ !-- dataSource driver=org.h2.Driver url=jdbc:h2:tcp:// 192.168.1.6/finance user=sa / -- document entity name=receipt query=SELECT 'transaction' as type, currency, name, amount, done_on from receipts join app_users on user_id = app_users.id deltaQuery=SELECT 'transaction' as type, name, currency, amount, done_on from receipts join app_users on user_id = app_users.id where done_on '${dataimporter.last_index_time}' field column=NAME name=name / field column=NAME name=nameSort / field column=NAME name=alphaNameSort / field column=AMOUNT name=amount / !-- currencyField not available till 3.6 -- field column=transaction_time name=done_on / !-- resolve epoch time -- field column=location name=location/ !-- geospatial?? -- /entity /document /dataConfig And the resulting query of *:*: % curl http://192.168.1.6:8995/solr/db/select/?q=*%3A*; [~] ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=q*:*/str/lst/lstresult name=response numFound=0 start=0/ /response The SQL query does work properly, the relevant jars are in the lib subdirectory. Help? -- H -- Sent from my mobile device Envoyait de mon portable 1. http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource
Recover - Read timed out
Hi I experience that a Solr looses its connection with Zookeeper and re-establish it. After Solr is reconnection to Zookeeper it begins to recover its replicas. It has been missing the connection approximately 10 seconds and meanwhile the leader slice has received some documents (maybe about 1000 documents). Solr fails to update using peer sync and fails afterwards to do a full replicate with the log message below. The Solr from where the documents are replicated doesn't log anything when the replication is in progress. The full replica continues to fail with the read time out for about 10 hours and then Solr gives up. 1. How can I get more information about why the Read time out happens? 2. It seems like the Solr from where it replicates leaks a http connection each time (and a thread) having about 18.000 threads in 8 hours. Any comments are welcome. Best regards Trym Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover:org.apache.solr.client.solrj.SolrServerException: http://solr-ip:8983/solr/mycollection_slice21_shard2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156) at org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440) ... 8 more
Re: Multi-words synonyms matching
yes, thanks, but this is NOT my question. I was wondering why I have multiple matches with q=hotel de ville and no match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case I'm searching in the same solr fieldType. Why is q parameter behaving differently in that case? Why do the quotes work in one case and not in the other? Does anyone know? Thanks, Elisabeth 2012/4/24 Jeevanandam je...@myjeeva.com usage of q and fq q = is typically the main query for the search request fq = is Filter Query; generally used to restrict the super set of documents without influencing score (more info. http://wiki.apache.org/solr/**CommonQueryParameters#qhttp://wiki.apache.org/solr/CommonQueryParameters#q ) For example: q=hotel de ville === returns 100 documents q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === returns 40 documents from super set of 100 documents hope this helps! - Jeevanandam On 24-04-2012 3:08 pm, elisabeth benoit wrote: Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.**KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms. filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ The problem I have is that now mairie matches with hotel and I would only want mairie to match with hotel de ville and mairie. When I look into the analyzer, I see that mairie is mapped into hotel, and words de ville are added in second and third position. To change that, I tried to do filter
debugging junit test with eclipse
I have tried all hints from internet for debugging a junit test of solr 3.6 under eclipse but didn't succeed. eclipse and everything is running, compiling, debugging with runjettyrun. Tests have no errors. Ant from command line ist also running with ivy, e.g. ant -Dtestmethod=testUserFields -Dtestcase=TestExtendedDismaxParser test-solr-core But I can't get a single test with junit running from eclipse and then jump into it for debugging. Any idea what's going wrong? Regards Bernd
Re: Deciding whether to stem at query time
Hi Andrew, This would not necessarily increase the size of your index that much - you don't to store both fields, just 1 of them if you really need it for highlighting or displaying. If not, just index. Otis Performance Monitoring for Solr - http://sematext.com/spm/solr-performance-monitoring From: Andrew Wagner wagner.and...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, April 24, 2012 7:21 AM Subject: Re: Deciding whether to stem at query time Ah, this is a really good point. Still seems like it has the downsides of #2, though, much bigger space requirements and possibly some time lost on queries. On Mon, Apr 23, 2012 at 3:35 PM, Walter Underwood wun...@wunderwood.orgwrote: There is a third approach. Create two fields and always query both of them, with the exact field given a higher weight. This works great and performs well. It is what we did at Netflix and what I'm doing at Chegg. wunder On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote: So I just realized the other day that stemming basically happens at index time. If I'm understanding correctly, there's no way to allow a user to specify, at run time, whether to stem particular words or not based on a single index. I think there are two options, but I'd love to hear that I'm wrong: 1.) Incrementally build up a white list of words that don't stem very well. To pick a random example out of the blue, light isn't super closely related to, lighter, so I might choose not to stem that. If I wanted to do this, I think (if I understand correctly), stemmerOverrideFilter would help me out with this. I'm not a big fan of this approach. 2.) Index all the text in two fields, once with stemming and once without. Then build some kind of option into the UI for specifying whether to stem the words or not, and search the appropriate field. Unfortunately, this would roughly double the size of my index, and probably affect query times too. Plus, the UI would probably suck. Am I missing an option? Has anyone tried one of these approaches? Thanks! Andrew
Query parsing VS marshalling/unmarshalling
Hi, I maintain a distributed system which Solr is part of. The data which is kept is Solr is permissioned and permissions are currently implemented by taking the original user query, adding certain bits to it which would make it return less data in the search results. Now I am at the point where I need to go over this functionality and try to improve it. Changing this to send two separate queries (q=...fq=...) would be the first logical thing to do, however I was thinking of an extra improvement. Instead of generating filter query, converting it into a String, sending over the HTTP just to parse it by Solr again - would it not be better to take generated Lucene fq query, serialize it using Java serialization, convert it to, say, Base64 and then send and deserialize it on the Solr end? Has anyone tried doing any performance comparisons on this topic? I am particularly concerned about this because in extreme cases my filter queries can be very large (1000s of characters long) and we already had to do tweaks as the size of GET requests would exceed default limits. And yes, we could move to POST but I would like to minimize both the amount of data that is sent over and the time taken to parse large queries. Thanks in advance. m.
Re: Query parsing VS marshalling/unmarshalling
2012/4/24 Mindaugas Žakšauskas min...@gmail.com: Hi, I maintain a distributed system which Solr is part of. The data which is kept is Solr is permissioned and permissions are currently implemented by taking the original user query, adding certain bits to it which would make it return less data in the search results. Now I am at the point where I need to go over this functionality and try to improve it. Changing this to send two separate queries (q=...fq=...) would be the first logical thing to do, however I was thinking of an extra improvement. Instead of generating filter query, converting it into a String, sending over the HTTP just to parse it by Solr again - would it not be better to take generated Lucene fq query, serialize it using Java serialization, convert it to, say, Base64 and then send and deserialize it on the Solr end? Has anyone tried doing any performance comparisons on this topic? I'm about to try out a contribution for serializing queries in Javascript using Jackson. I've previously done this by serializing my own data structure and putting the JSON into a custom query parameter. I am particularly concerned about this because in extreme cases my filter queries can be very large (1000s of characters long) and we already had to do tweaks as the size of GET requests would exceed default limits. And yes, we could move to POST but I would like to minimize both the amount of data that is sent over and the time taken to parse large queries. Thanks in advance. m.
Re: Deciding whether to stem at query time
Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit : This would not necessarily increase the size of your index that much - you don't to store both fields, just 1 of them if you really need it for highlighting or displaying. If not, just index. I second this. The query expansion process is far from being a slow thing... you can easily expand to tens of fields with a fairly small penalty. Where you have a penalty is at stored fields... these need to be really carefully avoided as much as possible. As long as you keep them small, the legendary performance of SOLR will still hold. paul
Re: Deciding whether to stem at query time
I'm sorry, I'm missing something. What's the difference between storing and indexing a field? On Tue, Apr 24, 2012 at 10:28 AM, Paul Libbrecht p...@hoplahup.net wrote: Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit : This would not necessarily increase the size of your index that much - you don't to store both fields, just 1 of them if you really need it for highlighting or displaying. If not, just index. I second this. The query expansion process is far from being a slow thing... you can easily expand to tens of fields with a fairly small penalty. Where you have a penalty is at stored fields... these need to be really carefully avoided as much as possible. As long as you keep them small, the legendary performance of SOLR will still hold. paul
Re: Query parsing VS marshalling/unmarshalling
On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies bimargul...@gmail.com wrote: I'm about to try out a contribution for serializing queries in Javascript using Jackson. I've previously done this by serializing my own data structure and putting the JSON into a custom query parameter. Thanks for your reply. Appreciate your effort, but I'm not sure if I fully understand the gain. Having data in JSON would still require it to be converted into Lucene Query at the end which takes space CPU effort, right? Or are you saying that having query serialized into a structured data blob (JSON in this case) makes it somehow easier to convert it into Lucene Query? I only thought about Java serialization because: - it's rather close to the in-object format - the mechanism is rather stable and is an established standard in Java/JVM - Lucene Queries seem to implement java.io.Serializable (haven't done a thorough check but looks good on the surface) - other conversions (e.g. using Xtream) are either slow or require custom annotations. I personally don't see how would Lucene/Solr include them in their core classes. Anyway, it would still be interesting to hear if anyone could elaborate on query parsing complexity. m.
RE: JDBC import yields no data
You might also want to show us your dataimport handler configuration from solrconfig.xml and also the url you're using to start the data import. When its complete, browsing to http://192.168.1.6:8995/solr/db/dataimport; (or whatever the DIH handler name is in your config) should say indexing complete and also the number of documents it imported. Also, if you have commit=false in your config, it won't issue a commit so you won't see the documents. If it fails, your servlet container's logs should have a stack trace or something indicating what the failure was. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Hasan Diwan [mailto:hasan.di...@gmail.com] Sent: Tuesday, April 24, 2012 8:51 AM To: solr-user@lucene.apache.org Subject: JDBC import yields no data I'm trying to migrate from RDBMS to the Lucene ecosystem. To do this, I'm trying to use the JDBC importer[1]. My configuration is given below: dataConfig dataSource driver=net.sf.log4jdbc.DriverSpy user=sa url=jdbc:log4jdbc:h2:tcp://192.168.1.6/finance/ !-- dataSource driver=org.h2.Driver url=jdbc:h2:tcp:// 192.168.1.6/finance user=sa / -- document entity name=receipt query=SELECT 'transaction' as type, currency, name, amount, done_on from receipts join app_users on user_id = app_users.id deltaQuery=SELECT 'transaction' as type, name, currency, amount, done_on from receipts join app_users on user_id = app_users.id where done_on '${dataimporter.last_index_time}' field column=NAME name=name / field column=NAME name=nameSort / field column=NAME name=alphaNameSort / field column=AMOUNT name=amount / !-- currencyField not available till 3.6 -- field column=transaction_time name=done_on / !-- resolve epoch time -- field column=location name=location/ !-- geospatial?? -- /entity /document /dataConfig And the resulting query of *:*: % curl http://192.168.1.6:8995/solr/db/select/?q=*%3A*; [~] ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/intlst name=paramsstr name=q*:*/str/lst/lstresult name=response numFound=0 start=0/ /response The SQL query does work properly, the relevant jars are in the lib subdirectory. Help? -- H -- Sent from my mobile device Envoyait de mon portable 1. http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource
Re: Group by distance
What do you mean by grouped? It's relatively easy to return only documents within a certain radius, and it's also easy to return the results ordered by distance. Here's a good place to start: http://wiki.apache.org/solr/SpatialSearch#geofilt_-_The_distance_filter Best Erick On Tue, Apr 24, 2012 at 6:33 AM, ViruS svi...@gmail.com wrote: I think this only can works when I have many records in same position. My problem is to group witch short distance... like I say in last mail... about 10km. I need put markers on Poland country and display this. Now I have 100k records, but in future I will have about 2mln records so I must send grouped records. Best, Piotr On 24 April 2012 12:08, ravicv ravichandra...@gmail.com wrote: Use group=true and group.field in your query. And your solr version should be solr 3.4 and above. Thanks, Ravi -- View this message in context: http://lucene.472066.n3.nabble.com/Group-by-distance-tp3934876p3934886.html Sent from the Solr - User mailing list archive at Nabble.com. -- Piotr (ViruS) Sikora E-mail/JID: vi...@hostv.pl http://piotrsikora.pl
Stats Component and solrj
Hey all, I'd like to know how many terms I have in a particular field in a search. In other words, I want to know how many facets I have in that field. I use string fields, there are no numbers. I wanted to use the Stats Component and use its count value. When trying this out in the browser, everything works like expected. However, when I want to do the same thing in my Java web app, I get an error because in FieldStatsInfo.class it says min = (Double)entry.getValue(); Where 'entry.getValue()' is a String because I have a string field here. Thus, I get an error that String cannot be cast to Double. In the browser I just got a String returned here, probably relative to an lexicographical order. I switched the Stats Component on with query.setGetFieldStatistics(authors); Where 'authors' is a field with author names. Is it possible that solrj not yet works with the Stats Component on string fields? I tried Solr 3.5 and 3.6 without success. Is there another easy way to get the count I want? Will solrj be fixed? Or am I just doing an error? Best regards, Erik
correct location in chain for EdgeNGramFilterFactory ?
hello all, i want to experiment with the EdgeNGramFilterFactory at index time. i believe this needs to go in post tokenization - but i am doing a pattern replace as well as other things. should the EdgeNGramFilterFactory go in right after the pattern replace? fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.PatternReplaceFilterFactory pattern=\. replacement= replace=all/ *put EdgeNGramFilterFactory here === ?* filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.PatternReplaceFilterFactory pattern=\. replacement= replace=all/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType thanks for any help, -- View this message in context: http://lucene.472066.n3.nabble.com/correct-location-in-chain-for-EdgeNGramFilterFactory-tp3935589p3935589.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-words synonyms matching
Elisabeth: What shows up in the debug section of the response when you add debugQuery=on? There should be some bit of that section like: parsed_filter_queries My other question is are you absolutely sure that your CATEGORY_ANALYZED field has the correct content?. How does it get populated? Nothing jumps out at me here Best Erick On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit elisaelisael...@gmail.com wrote: yes, thanks, but this is NOT my question. I was wondering why I have multiple matches with q=hotel de ville and no match with fq=CATEGORY_ANALYZED:hotel de ville, since in both case I'm searching in the same solr fieldType. Why is q parameter behaving differently in that case? Why do the quotes work in one case and not in the other? Does anyone know? Thanks, Elisabeth 2012/4/24 Jeevanandam je...@myjeeva.com usage of q and fq q = is typically the main query for the search request fq = is Filter Query; generally used to restrict the super set of documents without influencing score (more info. http://wiki.apache.org/solr/**CommonQueryParameters#qhttp://wiki.apache.org/solr/CommonQueryParameters#q ) For example: q=hotel de ville === returns 100 documents q=hotel de villefq=price:[100 To *]fq=roomType:King size Bed === returns 40 documents from super set of 100 documents hope this helps! - Jeevanandam On 24-04-2012 3:08 pm, elisabeth benoit wrote: Hello, I'd like to resume this post. The only way I found to do not split synonyms in words in synonyms.txt it to use the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/ in schema.xml where tokenizerFactory=solr.**KeywordTokenizerFactory instructs SynonymFilterFactory not to break synonyms into words on white spaces when parsing synonyms file. So now it works fine, mairie is mapped into hotel de ville and when I send request q=hotel de ville (quotes are mandatory to prevent analyzer to split hotel de ville on white spaces), I get answers with word mairie. But when I use fq parameter (fq=CATEGORY_ANALYZED:hotel de ville), it doesn't work!!! CATEGORY_ANALYZED is same field type as default search field. This means that when I send q=hotel de ville and fq=CATEGORY_ANALYZED:hotel de ville, solr uses the same analyzer, the one with the line filter class=solr.**SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true tokenizerFactory=solr.**KeywordTokenizerFactory/. Anyone as a clue what is different between q analysis behaviour and fq analysis behaviour? Thanks a lot Elisabeth 2012/4/12 elisabeth benoit elisaelisael...@gmail.com oh, that's right. thanks a lot, Elisabeth 2012/4/11 Jeevanandam Madanagopal je...@myjeeva.com Elisabeth - As you described, below mapping might suit for your need. mairie = hotel de ville, mairie mairie gets expanded to hotel de ville and mairie at index time. So mairie and hotel de ville searchable on document. However, still white space tokenizer splits at query time will be a problem as described by Markus. --Jeevanandam On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Yes, thanks, I've tried it but from what I undestand it doesn't solve my problem, since this means hotel de ville will be replace by mairie at index time (I use synonyms only at index time). So when user will ask hôtel de ville, it won't match. In fact, at index time I have mairie in my data, but I want user to be able to request mairie or hôtel de ville and have mairie as answer, and not have mairie as an answer when requesting hôtel. To map `mairie` to `hotel de ville` as single token you must escape your white space. mairie, hotel\ de\ ville This results in a problem if your tokenizer splits on white space at query time. Ok, I guess this means I have a problem. No simple solution since at query time my tokenizer do split on white spaces. I guess my problem is more or less one of the problems discussed in http://lucene.472066.n3.**nabble.com/Multi-word-** synonyms-td3716292.html#**a3717215http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 Thanks a lot for your answers, Elisabeth 2012/4/10 Erick Erickson erickerick...@gmail.com Have you tried the =' mapping instead? Something like hotel de ville = mairie might work for you. Best Erick On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I've read several post on this issue, but can't find a real solution to my multi-words synonyms matching problem. I have in my synonyms.txt an entry like mairie, hotel de ville and my index time analyzer is configured as followed for synonyms.
Re: Deciding whether to stem at query time
When you set store=true in your schema, a verbatim copy of the raw input is placed in the *.fdt file. That is the information returned when you specify the fl parameter for instance. When you set index=true, the input is analyzed and the resulting terms are placed in the inverted index and are searchable. The two are essentially completely orthogonal for all you specify them at the same time. So, a field that's stored but not indexed would be displayable to the user, but no searches could be performed on it. A field indexed but stored can be searched, but the information is not retrievable. Why are there two options? Well, you may use copyField to index the data two different ways for two different purposes, as in this thread. Putting the verbatim data in twice is wasteful, you only ever need it once. Why store in the first palce? Because all that gets into the inverted index is the results of the analysis. So if you indexed story with stemming turned on, it might result in stori being in the index. And if you use phonetic filters, it's much worse, your terms will be something like UNT4 or KMPT which are totally unsuitable to show the user. So if you want to _search_ phonetically but display the field to the user, you would both index and store. And even if you could recover the terms from the inverted index as they were fed in, it would be a very expensive process. Luke does this, you might try reconstructing a document with Luke to see what a reconstructed doc looks like, and how long it takes. Hope that helps Erick On Tue, Apr 24, 2012 at 10:40 AM, Andrew Wagner wagner.and...@gmail.com wrote: I'm sorry, I'm missing something. What's the difference between storing and indexing a field? On Tue, Apr 24, 2012 at 10:28 AM, Paul Libbrecht p...@hoplahup.net wrote: Le 24 avr. 2012 à 17:16, Otis Gospodnetic a écrit : This would not necessarily increase the size of your index that much - you don't to store both fields, just 1 of them if you really need it for highlighting or displaying. If not, just index. I second this. The query expansion process is far from being a slow thing... you can easily expand to tens of fields with a fairly small penalty. Where you have a penalty is at stored fields... these need to be really carefully avoided as much as possible. As long as you keep them small, the legendary performance of SOLR will still hold. paul
Re: Query parsing VS marshalling/unmarshalling
In general, query parsing is such a small fraction of the total time that, almost no matter how complex, it's not worth worrying about. To see this, attach debugQuery=on to your query and look at the timings in the pepare and process portions of the response. I'd be very sure that it was a problem before spending any time trying to make the transmission of the data across the wire more efficient, my first reaction is that this is premature optimization. Second, you could do this on the server side with a custom query component if you chose. You can freely modify the query over there and it may make sense in your situation. Third, consider no cache filters, which were developed for expensive filter queries, ACL being one of them. See: https://issues.apache.org/jira/browse/SOLR-2429 Fourth, I'd ask if there's a way to reduce the size of the FQ clause. Is this on a particular user basis or groups basis? If you can get this down to a few groups that would help. Although there's often some outlier who is member of thousands of groups :(. Best Erick 2012/4/24 Mindaugas Žakšauskas min...@gmail.com: On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies bimargul...@gmail.com wrote: I'm about to try out a contribution for serializing queries in Javascript using Jackson. I've previously done this by serializing my own data structure and putting the JSON into a custom query parameter. Thanks for your reply. Appreciate your effort, but I'm not sure if I fully understand the gain. Having data in JSON would still require it to be converted into Lucene Query at the end which takes space CPU effort, right? Or are you saying that having query serialized into a structured data blob (JSON in this case) makes it somehow easier to convert it into Lucene Query? I only thought about Java serialization because: - it's rather close to the in-object format - the mechanism is rather stable and is an established standard in Java/JVM - Lucene Queries seem to implement java.io.Serializable (haven't done a thorough check but looks good on the surface) - other conversions (e.g. using Xtream) are either slow or require custom annotations. I personally don't see how would Lucene/Solr include them in their core classes. Anyway, it would still be interesting to hear if anyone could elaborate on query parsing complexity. m.
Re: solr replication failing with error: Master at: is not available. Index fetch failed
hello, thank you for the reply, yes - master has been indexed. ok - makes sense - the polling interval needs to change i did check the solr war file on both boxes (master and slave). they are identical. actually - if they were not indentical - this would point to a different issue altogether - since our deployment infrastructure - rolls the war file to the slaves when you do a deployment on the master. this has me stumped - not sure what to check next. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query parsing VS marshalling/unmarshalling
Hi Erick, Thanks for looking into this and for the tips you've sent. I am leaning towards custom query component at the moment, the primary reason for it would be to be able to squeeze the amount of data that is sent over to Solr. A single round trip within the same datacenter is worth around 0.5 ms [1] and if query doesn't fit into a single ethernet packet, this number effectively has to double/triple/etc. Regarding cache filters - I was actually thinking the opposite: caching ACL queries (filter queries) would be beneficial as those tend to be the same across multiple search requests. [1] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//people/jeff/stanford-295-talk.pdf , slide 13 m. On Tue, Apr 24, 2012 at 4:43 PM, Erick Erickson erickerick...@gmail.com wrote: In general, query parsing is such a small fraction of the total time that, almost no matter how complex, it's not worth worrying about. To see this, attach debugQuery=on to your query and look at the timings in the pepare and process portions of the response. I'd be very sure that it was a problem before spending any time trying to make the transmission of the data across the wire more efficient, my first reaction is that this is premature optimization. Second, you could do this on the server side with a custom query component if you chose. You can freely modify the query over there and it may make sense in your situation. Third, consider no cache filters, which were developed for expensive filter queries, ACL being one of them. See: https://issues.apache.org/jira/browse/SOLR-2429 Fourth, I'd ask if there's a way to reduce the size of the FQ clause. Is this on a particular user basis or groups basis? If you can get this down to a few groups that would help. Although there's often some outlier who is member of thousands of groups :(. Best Erick 2012/4/24 Mindaugas Žakšauskas min...@gmail.com: On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies bimargul...@gmail.com wrote: I'm about to try out a contribution for serializing queries in Javascript using Jackson. I've previously done this by serializing my own data structure and putting the JSON into a custom query parameter. Thanks for your reply. Appreciate your effort, but I'm not sure if I fully understand the gain. Having data in JSON would still require it to be converted into Lucene Query at the end which takes space CPU effort, right? Or are you saying that having query serialized into a structured data blob (JSON in this case) makes it somehow easier to convert it into Lucene Query? I only thought about Java serialization because: - it's rather close to the in-object format - the mechanism is rather stable and is an established standard in Java/JVM - Lucene Queries seem to implement java.io.Serializable (haven't done a thorough check but looks good on the surface) - other conversions (e.g. using Xtream) are either slow or require custom annotations. I personally don't see how would Lucene/Solr include them in their core classes. Anyway, it would still be interesting to hear if anyone could elaborate on query parsing complexity. m.
Re: Auto suggest on indexed file content filtered based on user
Right now, the query is a very simple one, something like q=text. Basically, it would return ['textview', 'textviewer', ..] But the issue is, the 'textviewer' could be from a file that is out of bounds for this user. So, ultimately I would like to include the userName in the query. As mentioned earlier, userName is another field in the main index. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3935765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr replication failing with error: Master at: is not available. Index fetch failed
Hi, In Solr wiki, for replication, the master url is defined as follows str name=masterUrlhttp://master_host:port /solr/corename/replication/str This url does not contain admin in its path where as in the master url provided by you, you have an additional admin in the url. Not very sure if this might be an issue but you can just check removing admin and check if replication works. On Tue, Apr 24, 2012 at 11:49 AM, geeky2 gee...@hotmail.com wrote: hello, thank you for the reply, yes - master has been indexed. ok - makes sense - the polling interval needs to change i did check the solr war file on both boxes (master and slave). they are identical. actually - if they were not indentical - this would point to a different issue altogether - since our deployment infrastructure - rolls the war file to the slaves when you do a deployment on the master. this has me stumped - not sure what to check next. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3935699.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Re: Auto suggest on indexed file content filtered based on user
On Apr 24, 2012, at 9:37 PM, prakash_ajp wrote: Right now, the query is a very simple one, something like q=text. Basically, it would return ['textview', 'textviewer', ..] hmm, so you're using default query field But the issue is, the 'textviewer' could be from a file that is out of bounds for this user. So, ultimately I would like to include the userName in the query. As mentioned earlier, userName is another field in the main index. and you like to filter the result set along with userName field value -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3935765.html Sent from the Solr - User mailing list archive at Nabble.com. in this scenario 'fq' parameter will facilitate to achieve your desire result. Please refer http://wiki.apache.org/solr/CommonQueryParameters#fq try this q=textfq=userName:prakash Let us know! -Jeevanandam
Re: JDBC import yields no data
On 24 April 2012 07:49, Dyer, James james.d...@ingrambook.com wrote: You might also want to show us your dataimport handler configuration from solrconfig.xml and also the url you're using to start the data import. When its complete, browsing to http://192.168.1.6:8995/solr/db/dataimport; (or whatever the DIH handler name is in your config) should say indexing complete and also the number of documents it imported. Also, if you have commit=false in your config, it won't issue a commit so you won't see the documents. solrconfig.xml: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- config luceneMatchVersionLUCENE_35/luceneMatchVersion jmx / !-- Set this to 'false' if you want solr to continue working after it has encountered an severe configuration error. In a production environment, you may want solr to keep working even if one handler is mis-configured. You may also set this to false using by setting the system property: -Dsolr.abortOnConfigurationError=false -- abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError lib dir=../../../../dist/ regex=apache-solr-dataimporthandler-.*\.jar / indexDefaults !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor !-- If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. -- !--maxBufferedDocs1000/maxBufferedDocs-- !-- Tell Lucene when to flush documents to disk. Giving Lucene more memory for indexing means faster indexing at the cost of more RAM If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. -- ramBufferSizeMB32/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout !-- Expert: The Merge Policy in Lucene controls how merging is handled by Lucene. The default in 2.3 is the LogByteSizeMergePolicy, previous versions used LogDocMergePolicy. LogByteSizeMergePolicy chooses segments to merge based on their size. The Lucene 2.2 default, LogDocMergePolicy chose when to merge based on number of documents Other implementations of MergePolicy must have a no-argument constructor -- !--mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy-- !-- Expert: The Merge Scheduler in Lucene controls how merges are performed. The ConcurrentMergeScheduler (Lucene 2.3 default) can perform merges in the background using separate threads. The SerialMergeScheduler (Lucene 2.2 default) does not. -- !--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler-- !-- As long as Solr is the only process modifying your index, it is safe to use Lucene's in process locking mechanism. But you may specify one of the other Lucene LockFactory implementations in the event that you have a custom situation. none = NoLockFactory (typically only used with read only indexes) single = SingleInstanceLockFactory (suggested) native = NativeFSLockFactory simple = SimpleFSLockFactory ('simple' is the default for backwards compatibility with Solr 1.2) -- lockTypesingle/lockType /indexDefaults mainIndex !-- options specific to the main on-disk lucene index -- useCompoundFilefalse/useCompoundFile ramBufferSizeMB32/ramBufferSizeMB mergeFactor10/mergeFactor !-- Deprecated -- !--maxBufferedDocs1000/maxBufferedDocs-- maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength !-- If true, unlock any held write or commit locks on startup. This defeats the locking mechanism that allows multiple processes to safely access a lucene index, and should be used with care. This is not needed if lock type is 'none' or 'single' -- unlockOnStartupfalse/unlockOnStartup /mainIndex !-- the default high-performance update handler -- updateHandler
Re: JDBC import yields no data
On 24 April 2012 22:22, Hasan Diwan hasan.di...@gmail.com wrote: [...] The dataimport url I'm using is http://192.168.1.6:8995/solr/db/dataimport?command=full-import And, does it show you any output? As James mentions, it should say busy while the data import is running, and indexing completed when done. Also, is the above URL correct? /solr/db/ looks a little odd, but that could have to do with how you have Solr set up. My other guess would be that your JDBC set up is not correct. For testing, you could try to simplify it by not using net.sf.log4jdbc.DriverSpy , and trying directly with the H2 database JDBC driver. Regards, Gora
RE: JDBC import yields no data
After you issue the full-import command with the url you gave: http://192.168.1.6:8995/solr/db/dataimport?command=full-import Paste the url in a web browser without the command http://192.168.1.6:8995/solr/db/dataimport It should be giving you status as to how many database calls its made, how many rows read documents indexed. Keep refreshing the page until it is done. When it finishes, you should get either a Success or a Failure message. Is it saying success or failure? Also how many documents does it say it indexed? James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Hasan Diwan [mailto:hasan.di...@gmail.com] Sent: Tuesday, April 24, 2012 11:52 AM To: solr-user@lucene.apache.org Subject: Re: JDBC import yields no data On 24 April 2012 07:49, Dyer, James james.d...@ingrambook.com wrote: You might also want to show us your dataimport handler configuration from solrconfig.xml and also the url you're using to start the data import. When its complete, browsing to http://192.168.1.6:8995/solr/db/dataimport; (or whatever the DIH handler name is in your config) should say indexing complete and also the number of documents it imported. Also, if you have commit=false in your config, it won't issue a commit so you won't see the documents. solrconfig.xml: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- config luceneMatchVersionLUCENE_35/luceneMatchVersion jmx / !-- Set this to 'false' if you want solr to continue working after it has encountered an severe configuration error. In a production environment, you may want solr to keep working even if one handler is mis-configured. You may also set this to false using by setting the system property: -Dsolr.abortOnConfigurationError=false -- abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError lib dir=../../../../dist/ regex=apache-solr-dataimporthandler-.*\.jar / indexDefaults !-- Values here affect all index writers and act as a default unless overridden. -- useCompoundFilefalse/useCompoundFile mergeFactor10/mergeFactor !-- If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. -- !--maxBufferedDocs1000/maxBufferedDocs-- !-- Tell Lucene when to flush documents to disk. Giving Lucene more memory for indexing means faster indexing at the cost of more RAM If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first. -- ramBufferSizeMB32/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout !-- Expert: The Merge Policy in Lucene controls how merging is handled by Lucene. The default in 2.3 is the LogByteSizeMergePolicy, previous versions used LogDocMergePolicy. LogByteSizeMergePolicy chooses segments to merge based on their size. The Lucene 2.2 default, LogDocMergePolicy chose when to merge based on number of documents Other implementations of MergePolicy must have a no-argument constructor -- !--mergePolicyorg.apache.lucene.index.LogByteSizeMergePolicy/mergePolicy-- !-- Expert: The Merge Scheduler in Lucene controls how merges are performed. The ConcurrentMergeScheduler (Lucene 2.3 default) can perform merges in the background using separate threads. The SerialMergeScheduler (Lucene 2.2 default) does not. -- !--mergeSchedulerorg.apache.lucene.index.ConcurrentMergeScheduler/mergeScheduler-- !-- As long as Solr is the only process modifying your index, it is safe to use Lucene's in process locking mechanism. But you may specify one of the other Lucene LockFactory implementations in the event that you have a custom situation. none = NoLockFactory (typically only used with read only indexes) single = SingleInstanceLockFactory (suggested) native = NativeFSLockFactory simple = SimpleFSLockFactory ('simple' is the default for backwards compatibility with Solr 1.2) -- lockTypesingle/lockType
Re: Query parsing VS marshalling/unmarshalling
If you're assembling an fq clause, this is all done or you, although you need to take some care to form the fq clause _exactly_ the same way each time. Think of the filterCache as a key/value map where the key is the raw fq text and the value is the docs satisfying that query. So fq=acl:(a OR a) will not, for instance, match fq=acl:(b OR a) FWIW Erick 2012/4/24 Mindaugas Žakšauskas min...@gmail.com: Hi Erick, Thanks for looking into this and for the tips you've sent. I am leaning towards custom query component at the moment, the primary reason for it would be to be able to squeeze the amount of data that is sent over to Solr. A single round trip within the same datacenter is worth around 0.5 ms [1] and if query doesn't fit into a single ethernet packet, this number effectively has to double/triple/etc. Regarding cache filters - I was actually thinking the opposite: caching ACL queries (filter queries) would be beneficial as those tend to be the same across multiple search requests. [1] http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//people/jeff/stanford-295-talk.pdf , slide 13 m. On Tue, Apr 24, 2012 at 4:43 PM, Erick Erickson erickerick...@gmail.com wrote: In general, query parsing is such a small fraction of the total time that, almost no matter how complex, it's not worth worrying about. To see this, attach debugQuery=on to your query and look at the timings in the pepare and process portions of the response. I'd be very sure that it was a problem before spending any time trying to make the transmission of the data across the wire more efficient, my first reaction is that this is premature optimization. Second, you could do this on the server side with a custom query component if you chose. You can freely modify the query over there and it may make sense in your situation. Third, consider no cache filters, which were developed for expensive filter queries, ACL being one of them. See: https://issues.apache.org/jira/browse/SOLR-2429 Fourth, I'd ask if there's a way to reduce the size of the FQ clause. Is this on a particular user basis or groups basis? If you can get this down to a few groups that would help. Although there's often some outlier who is member of thousands of groups :(. Best Erick 2012/4/24 Mindaugas Žakšauskas min...@gmail.com: On Tue, Apr 24, 2012 at 3:27 PM, Benson Margulies bimargul...@gmail.com wrote: I'm about to try out a contribution for serializing queries in Javascript using Jackson. I've previously done this by serializing my own data structure and putting the JSON into a custom query parameter. Thanks for your reply. Appreciate your effort, but I'm not sure if I fully understand the gain. Having data in JSON would still require it to be converted into Lucene Query at the end which takes space CPU effort, right? Or are you saying that having query serialized into a structured data blob (JSON in this case) makes it somehow easier to convert it into Lucene Query? I only thought about Java serialization because: - it's rather close to the in-object format - the mechanism is rather stable and is an established standard in Java/JVM - Lucene Queries seem to implement java.io.Serializable (haven't done a thorough check but looks good on the surface) - other conversions (e.g. using Xtream) are either slow or require custom annotations. I personally don't see how would Lucene/Solr include them in their core classes. Anyway, it would still be interesting to hear if anyone could elaborate on query parsing complexity. m.
Re: solr replication failing with error: Master at: is not available. Index fetch failed
that was it! thank you. i did notice something else in the logs now ... what is the meaning or implication of the message, Connection reset.? 2012-04-24 12:59:19,996 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 12:59:39,998 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. *2012-04-24 12:59:59,997 SEVERE [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Master at: http://bogus:bogusport/somepath/somecore/replication/ is not available. Index fetch failed. Exception: Connection reset* 2012-04-24 13:00:19,998 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:00:40,004 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:00:59,992 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:19,993 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:39,992 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:01:59,989 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:19,990 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:39,989 INFO [org.apache.solr.handler.SnapPuller] (pool-12-thread-1) Slave in sync with master. 2012-04-24 13:02:59,991 INFO [org.a -- View this message in context: http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3936107.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Auto suggest on indexed file content filtered based on user
I'm new to Solr, but I would think the fq=[username] would work here. http://wiki.apache.org/solr/CommonQueryParameters#fq Mike -Original Message- From: prakash_ajp [mailto:prakash_...@yahoo.com] Sent: Tuesday, April 24, 2012 11:07 AM To: solr-user@lucene.apache.org Subject: Re: Auto suggest on indexed file content filtered based on user Right now, the query is a very simple one, something like q=text. Basically, it would return ['textview', 'textviewer', ..] But the issue is, the 'textviewer' could be from a file that is out of bounds for this user. So, ultimately I would like to include the userName in the query. As mentioned earlier, userName is another field in the main index. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3935765.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto suggest on indexed file content filtered based on user
I read on a couple of other web pages that fq is not supported for suggester. I even tried the query and it doesn't help. My understanding was, when the suggest (spellcheck) index is built, only the field chosen is considered for queries and the other fields from the main index are not available for filtering purposes once the index is created. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3936144.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto suggest on indexed file content filtered based on user
yes only spellcheck indexed build field is for suggest query I believe, filtering a documents on search handler using fq parameter and spell suggest are two part we are discussing here. lets say you have field for spellcheck - used to build spell dictionary field name=spell type=textSpell …. … / using copyField for populating a spell field and get dictionary created referring spellcheck handler in the default search handler at 'last-components' section, like below arr name=last-components strspellcheck/str /arr then you will be able to apply search documents filtering and spellcheck params to search handler while querying. detailed info http://wiki.apache.org/solr/SpellCheckComponent [probably you might have already went thru :) ] -Jeevanandam On Apr 25, 2012, at 12:01 AM, prakash_ajp wrote: I read on a couple of other web pages that fq is not supported for suggester. I even tried the query and it doesn't help. My understanding was, when the suggest (spellcheck) index is built, only the field chosen is considered for queries and the other fields from the main index are not available for filtering purposes once the index is created. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3936144.html Sent from the Solr - User mailing list archive at Nabble.com.
Field names w/ leading digits cause strange behavior
When specifying a field name that starts with a digit (or digits) in the fl parameter solr returns both the field name and field value as the those digits. For example, using nightly build apache-solr-4.0-2012-04-24_08-27-47 I run: java -jar start.jar and java -jar post.jar solr.xml monitor.xml If I then add a field to the field list that starts with a digit ( localhost:8983/solr/select?q=*:*fl=24 ) the results look like: ... doc long name=2424/long /doc ... if I try fl=24_7 it looks like everything after the underscore is truncated ... doc long name=2424/long /doc ... and if I try fl=3test it looks like everything after the last digit is truncated ... doc long name=33/long /doc ... If I have an actual value for that field (say I've indexed 24_7 to be true ) I get back that value as well as the behavior above. ... doc bool name=24_7true/bool long name=2424/long /doc ... Is it ok the have fields that start with digits? If so, is there a different way to specify them using the fl parameter? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936354.html Sent from the Solr - User mailing list archive at Nabble.com.
embedded solr populating field of type LatLonType
Hi, I have a question concerning the spatial field type LatLonType and populating it via an embedded solr server in java. So far I've only ever had to index simple types like boolean, float, and string. This is the first complex type. So I'd like to use the following field definition for example in my schema: field name=coordinate type=LatLonType indexed=true stored=false multiValued=false/ And then I'd like to populate this field in java as in the following psuedo code: public SolrInputDocument populate(AppropriateJavaType coordinate) { SolrInputField inputField = new SolrInputField(coordinate); inputField.addValue(coordinate, 1.0F); SolrInputDocument inputDocument = new SolrInputDocument(); inputDocument.put(coordinate, inputField); return inputDocument; } My question is, what is the AppropriateJavaType for populating a solr field of type LatLonType? Thank you for your time.
Re: correct location in chain for EdgeNGramFilterFactory ?
Well, what effect do you _want_? I'd probably put it after the PorterStemFilterFactory. As it is, it'll form a bunch of ngrams, then WordDelimiterFilterFactory will try to break them up according to _its_ rules and eventually you'll be sending absolute gibberish to the stemmer. I mean what is the stemmer going to think of (starting out with running) ru, run, runn, runni, runnin, running? I suggest you spend some time with admin/analysis with various orderings to understand better how all the parts interact. Best Erick On Tue, Apr 24, 2012 at 11:20 AM, geeky2 gee...@hotmail.com wrote: hello all, i want to experiment with the EdgeNGramFilterFactory at index time. i believe this needs to go in post tokenization - but i am doing a pattern replace as well as other things. should the EdgeNGramFilterFactory go in right after the pattern replace? fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.PatternReplaceFilterFactory pattern=\. replacement= replace=all/ *put EdgeNGramFilterFactory here === ?* filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.PatternReplaceFilterFactory pattern=\. replacement= replace=all/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType thanks for any help, -- View this message in context: http://lucene.472066.n3.nabble.com/correct-location-in-chain-for-EdgeNGramFilterFactory-tp3935589p3935589.html Sent from the Solr - User mailing list archive at Nabble.com.
faceted searches - design question - facet field not part of qf search fields
hello all, this is more of a design / newbie question on how others combine faceted search fields in to their requestHandlers. say you have a request handler set up like below. does it make sense (from a design perspective) to add a faceted search field that is NOT part of the main search fields (itemNo, productType, brand) in the qf param? for example, augment the requestHandler below to include a faceted search on itemDesc? would this be confusing ? - to be searching across three fields - but offering faceted suggestions on itemDesc? just trying to understand how others approach this thanks requestHandler name=generalSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfitemNo^1.0 productType^.8 brand^.5/str str name=q.alt*:*/str /lst lst name=appends /lst lst name=invariants str name=facetfalse/str /lst /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/faceted-searches-design-question-facet-field-not-part-of-qf-search-fields-tp3936509p3936509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto suggest on indexed file content filtered based on user
I don't know if there is a really good solution here. The problem is that suggester (and the trunk FST version) simply traverse the terms in the index. there's not even a real concept of those terms belonging to any document. Since your security level is on a document basis, that makes things hard. How many users do you have? And do you ever expect to search across more than one user's files? If not, you could consider having one core per user. Then the suggestions would be correct and since the searches would be against the user's core, they'd never see any documents they didn't own. But that solution has some complexity involved, and if you have a zillion users it can be difficult to get right. You could consider having separate (dynamically-defined) fields that had the suggestion list for each individual user. that would be administratively easier. Then you suggestions would simply go against that user's suggestion field (suggestion_user1 e.g.). None of this is elegant, but this is not an elegant problem given how Solr is structured. Best Erick On Tue, Apr 24, 2012 at 2:31 PM, prakash_ajp prakash_...@yahoo.com wrote: I read on a couple of other web pages that fq is not supported for suggester. I even tried the query and it doesn't help. My understanding was, when the suggest (spellcheck) index is built, only the field chosen is considered for queries and the other fields from the main index are not available for filtering purposes once the index is created. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3936144.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: faceted searches - design question - facet field not part of qf search fields
No problem here at all, it's done all the time. Consider a popular facet series in the last day, in the last week, in the last month... There's no reason you have to facet on the fields that are searched on. The user as search terms like my dog has fleas and your query looks like q=my dog has fleasfq=timestamp:[NOW/DAY TO NOW/DAY+1DAY] and the user sees all documents with those terms added since midnight last night. No confusion at all... Best Erick On Tue, Apr 24, 2012 at 4:28 PM, geeky2 gee...@hotmail.com wrote: hello all, this is more of a design / newbie question on how others combine faceted search fields in to their requestHandlers. say you have a request handler set up like below. does it make sense (from a design perspective) to add a faceted search field that is NOT part of the main search fields (itemNo, productType, brand) in the qf param? for example, augment the requestHandler below to include a faceted search on itemDesc? would this be confusing ? - to be searching across three fields - but offering faceted suggestions on itemDesc? just trying to understand how others approach this thanks requestHandler name=generalSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows10/int str name=qfitemNo^1.0 productType^.8 brand^.5/str str name=q.alt*:*/str /lst lst name=appends /lst lst name=invariants str name=facetfalse/str /lst /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/faceted-searches-design-question-facet-field-not-part-of-qf-search-fields-tp3936509p3936509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field names w/ leading digits cause strange behavior
Hmmm, this does NOT happen on 3.6, and it DOES happen on trunk. Sure sounds like a JIRA to me, would you mind raising one? I can't imagine this is desired behavior, it's just weird. Thanks for pointing this out! Erick On Tue, Apr 24, 2012 at 3:38 PM, bleakley bleak...@factual.com wrote: When specifying a field name that starts with a digit (or digits) in the fl parameter solr returns both the field name and field value as the those digits. For example, using nightly build apache-solr-4.0-2012-04-24_08-27-47 I run: java -jar start.jar and java -jar post.jar solr.xml monitor.xml If I then add a field to the field list that starts with a digit ( localhost:8983/solr/select?q=*:*fl=24 ) the results look like: ... doc long name=2424/long /doc ... if I try fl=24_7 it looks like everything after the underscore is truncated ... doc long name=2424/long /doc ... and if I try fl=3test it looks like everything after the last digit is truncated ... doc long name=33/long /doc ... If I have an actual value for that field (say I've indexed 24_7 to be true ) I get back that value as well as the behavior above. ... doc bool name=24_7true/bool long name=2424/long /doc ... Is it ok the have fields that start with digits? If so, is there a different way to specify them using the fl parameter? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936354.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: faceted searches - design question - facet field not part of qf search fields
: : The user as search terms like my dog has fleas and your query : looks like : q=my dog has fleasfq=timestamp:[NOW/DAY TO NOW/DAY+1DAY] : and the user sees all documents with those terms added since midnight : last night. No confusion at all... right ... wether the facets are useful or confusing has nothing to do with wether the fields are in your qf ... what matters is what you *do* with those facet counts once you have them. if you over the user the ability to filter on a constraint (which is what most people do with facet info) then as long as you generate that filter using hte same field, as an fq, then everything should make sense. if instead you just try to add the constraint to your main q query string, as an additional clause, then that is likely to make no sense at all, since the terms from your facet field may not have any bearing on the fields you are querying against. -Hoss
Re: Field names w/ leading digits cause strange behavior
Thank you for verifying the issue. I've created a ticket at https://issues.apache.org/jira/browse/SOLR-3407 -- View this message in context: http://lucene.472066.n3.nabble.com/Field-names-w-leading-digits-cause-strange-behavior-tp3936354p3936599.html Sent from the Solr - User mailing list archive at Nabble.com.
Title Boosting and IDF
Hey everyone, I field documents by title and body. The title field often has far fewer terms than the body field. IDF, as a result, will have a profound effect in the title field compared to the body field. I currently have the title field boosted by 4x relative to the body field. While I want matches in the title field to result in higher scores than matches in the body field, I don't believe I want the title to completely trump the body. I've seen this happen when a rare term is present in the title field, and IDF combines with the 4x boost to wreak havoc. I'd like to get your thoughts on the following: - Is it standard practice to avoid boosting the title field much, because of the (generally) high IDF of title field terms? - Are there other strategies for handling the high IDF of a title field? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Title-Boosting-and-IDF-tp3936709p3936709.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto suggest on indexed file content filtered based on user
Another option is to use faceting (via the facet.prefix param) for your auto-suggest. It's not as fast and scalable as using one of the Suggester implementations, but it does allow arbitrary fq parameters to be included in the request to limit the results. http://wiki.apache.org/solr/SimpleFacetParameters#Facet_prefix_.28term_suggest.29 Doug On 04/24/2012 04:30 PM, Erick Erickson wrote: I don't know if there is a really good solution here. The problem is that suggester (and the trunk FST version) simply traverse the terms in the index. there's not even a real concept of those terms belonging to any document. Since your security level is on a document basis, that makes things hard. How many users do you have? And do you ever expect to search across more than one user's files? If not, you could consider having one core per user. Then the suggestions would be correct and since the searches would be against the user's core, they'd never see any documents they didn't own. But that solution has some complexity involved, and if you have a zillion users it can be difficult to get right. You could consider having separate (dynamically-defined) fields that had the suggestion list for each individual user. that would be administratively easier. Then you suggestions would simply go against that user's suggestion field (suggestion_user1 e.g.). None of this is elegant, but this is not an elegant problem given how Solr is structured. Best Erick On Tue, Apr 24, 2012 at 2:31 PM, prakash_ajpprakash_...@yahoo.com wrote: I read on a couple of other web pages that fq is not supported for suggester. I even tried the query and it doesn't help. My understanding was, when the suggest (spellcheck) index is built, only the field chosen is considered for queries and the other fields from the main index are not available for filtering purposes once the index is created. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3936144.html Sent from the Solr - User mailing list archive at Nabble.com.
QueryElevationComponent and distributed search
Hi, I am using solr 3.6. I saw in Solr wiki that QueryElevationComponent is not supported for distributed search. https://issues.apache.org/jira/browse/SOLR-2949 When I checked the above ticket, it looks like its fixed in Solr 4.0. Does anyone have any idea when a stable version of solr 4.0 will be released (approx time frame). If not, are these changes independent to other solr 4.0 changes that i can just copy this patch to my setup for now? I would like to use solr 3.6 because i would like to use a stable version in production. Thanks Srini -- View this message in context: http://lucene.472066.n3.nabble.com/QueryElevationComponent-and-distributed-search-tp3936998p3936998.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto suggest on indexed file content filtered based on user
The first one may not work because the number of users can be big. Besides, the users can simply register themselves and start using it. It won't work if an admin has to intervene in the registration process. The second could work I guess. But the problem would be data duplication as users might also share permissions to same files and folders. I understand my requirement is a little complicated. -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3937368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto suggest on indexed file content filtered based on user
Is it true that faceting is case sensitive? That would be disastrous for our requirement :( -- View this message in context: http://lucene.472066.n3.nabble.com/Auto-suggest-on-indexed-file-content-filtered-based-on-user-tp3934565p3937370.html Sent from the Solr - User mailing list archive at Nabble.com.