Re: SlrCloud RAM requirments
On 9/24/2014 2:18 AM, Toke Eskildsen wrote: Norgorn [lsunnyd...@mail.ru] wrote: I have CLOUD with 3 nodes and 16 MB RAM on each. My index is about 1 TB and search speed is awfully bad. We all have different standard with regards to search performance. What is awfully bad and what is good enough for you? Related to this: How many documents are in your index, how do you query (faceting, sorting, special searches) and how often is an index performed? I've read, that one needs at least 50% of index size in RAM, That is the common advice, yes. The advice is not bad for some use cases. The problem is that it has become gospel. I am guessing that you are using spinning drives? Solr needs fast random access reads and spinning drives are very slow for that. You can either compensate by buying enough RAM or you can change to a faster underlying storage technology. The obvious choice these days are Solid State Drives (we bought Samsung 840 EVO's last time and would probably buy those again). They will not give you RAM speed, but they do give a lot more bang for the buck and depending on your performance requirements they can be enough. I am guilty of spreading the gospel that you need 50-100% of your index to fit in the OS disk cache, as Toke mentioned. This wiki page is my creation: http://wiki.apache.org/solr/SolrPerformanceProblems I've seen decent performance out of systems with standard hard disks that only had enough RAM to fit about 25% of the index into the disk cache, but I've also seen systems with 50% that can't complete a simple query in less than 10 seconds. With a terabyte of index on the system (assuming that's how much is on each one), 25% is still at least 256GB of RAM. With only 16GB, there's simply no way you'll ever get good performance. I've heard quite a lot of anecdotal evidence that if you put the index on SSD, you only need 10% of the index to fit in RAM. I'm a little bit skeptical that this would be true as a general rule, but I do not doubt that it's been done successfully. For a terabyte index, that's still 100GB of RAM, so 128GB would be the absolute minimum that you'll want to consider. The more RAM you can throw at this problem, the better your performance will be. Thanks, Shawn
Using SolrCloud on Amazon EC2
Hi together, we currently plan to setup a project based on solr cloud and amazon webservices. Our main search application is deployed using aws opsworks which works out quite good. Since we also want to provision solr to ec2 i want to ask for experiences with the different deployment/provisioning tools. By now i see the following 3 approaches. 1. Using ludic solr scale tk to setup and maintain the cluster Who is using this in production and what are your experiences? 2. Implementing own chef cookbooks for aws opsworks to install solrcloud as a custom opsworks layer Did somebody do this allready? What are you experiences? Are there any cookbooks out, where we can contribute and reuse? 3. Implementing own chef cookbooks for aws opsworks to install solrcloud as a docker container Any experiences with this? Do you see other options? Afaik elasticbeanstalk could also be an option? It would be very nice to get some experiences and recommendations? Cheers Timo
Help needed in Indexing and Search on xml content
Hi Team, I am a newbie to SOLR. I have got search fields stored in a xml file which is stored in MSSQL. I want to index on the content of the xml file in SOLR. We need to provide search based on the fields present in the XML file. The reason why we are storing the input details as XML file is , the users will be able to add custom input fields on their own with values. Storing these custom fields as columns in MSSQL seems to be not an optimal solution. So we thought of putting it in XML file and store that file in RDBMS. But I am not sure on how we can index the content of the file to make search better. I believe this can be done by ExtractingRequestHandler. Could someone help me on how we can implement this/ direct me to some pages which could be of help to me ? Thanks Sangeetha
(auto)suggestions, but ony from a filtered set of documents
What I'd like to do is http://localhost:8983/solr/solrpedia/suggest?q=atmqf=source:mysource Through qf (or however the parameter shall be called) I'd like to restrict the suggestions to documents which fit the given qf-query. I need this filter if (as posted in a previous thread) I intend to put different kind of data into one core/collection, cause suggestion shall be restrictable to one or many source(s)
Re: SlrCloud RAM requirments
On Thu, 2014-09-25 at 06:29 +0200, Norgorn wrote: I can't say for sure, cause filter caches are out of the JVM (dat HS), but top shows 5 GB cached and no free RAM. The cached reported from top should be correct, no matter if one used off-heap or not: You have 5GB for (I guess) 300MB index, so 1.5% of the index size. I agree fully with Shawn that this will never perform for interactive use, when you're using spinning drives. The only question for me now is how to balance disk cache and filter cache? Do I need to worry about that, or big disk cache is enough? Even if you skipped the filters fully (so just simple queries) and magically had 15GB out of the 16GB free for disk cache, it would only be 5% of the index size. Still not enough for decent performance with spinning drives, unless your index is very special, e.g. enormous amount of stored fields. As for the whole how much will it help with SSDs?, might I suggest simply testing? Buy a 500GB SSD and put it in one of the machines, test searches against that shard vs. the shards on the other machines. If you do not see much difference, move the drive to your developer machine and be happy for the upgrade. Win-win. And does optimized index mean SOLR optimize command, or something else? Optimized down to a single segment (which I think the 'optimize' command will do). But you should only consider that if you know that your shard will not be updated in the foreseeable future. - Toke Eskildsen, State and University Library, Denmark
Re: traversing Automaton in lucene 4.10
case solved, example of traversal found in lucene's source code (pointed to by Mike McCandless): https://github.com/apache/lucene-solr/blob/2836bd99101026860b12233a87e35101769a538f/lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java#L535 On Fri, Sep 19, 2014 at 5:27 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi, o.a.l.u.automaton.Automaton api has changed in lucene 4.10 ( https://issues.apache.org/jira/secure/attachment/12651171/LUCENE-5752.patch ). Method getNumberedStates() got dropped class State does not exist anymore. How do I traverse an Automaton with the new api? Dmitry -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
/suggest through SolrJ?
Am I right that I cannot call /suggest (i.e. the corresponding RequestHandler) through SolrJ? What is the preferreded way to call Solr handlers/operations not supported by SolrJ from Java? Through new SolrJ Request-classes?
Turn off suggester
Is there a way to turn off the solr suggester? I have about 30M records and when tomcat starts up, it takes a long time (~10 minutes) for the suggester to decompress the data or its doing soothing as it hangs on SolrSuggester.build(); Any ideas please? Thanks -Peri *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
SolrCloud Slow to boot up
Hello all, Hosted a SolrCloud - 6 Nodes - 36 Shards x 3 Replica each - 108 cores across 6 servers. Moved in about 250M documents in this cluster. When I restart this cluster - only the leaders per shard comes up live instantly (within a minute) and all the replicas are shown as Recovering on the Cloud screen and all 6 servers are doing some processing (consuming about 4 CPUs at the back and doing a lot of Network IO too) In essence its not doing any reads are writes to the index and I dont see any replication/catch up activity going on too at the back, yet the RAM grows consuming all 96GB available on each box. And all the Recovering replicas recover one by one in about an hour or so. Why is it taking so long to boot up, and what is it doing that is consuming so much CPU, RAM and Network IO? All disks are reading at 100% on all servers during this boot up. Is there are setting I might have missed that will help? FYI - The Zookeeper cluster is on the same 6 boxes. Size of the Solr data dir is about 150GB per server and each box has 96GB RAM. Thanks, Anand -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scoring with wild cars
The wildcard query is “constant score” to make it faster, so unfortunately that means there is no score differentiation between the wildcard matches. You can simple add the wildcard prefix as a separate query term and boost it: q=text:carre* text:carre^1.5 -- Jack Krupansky From: Pigeyre Romain Sent: Wednesday, September 24, 2014 2:12 PM To: solr-user@lucene.apache.org Cc: Pigeyre Romain Subject: Scoring with wild cars Hi, I hava two records with name_fra field One with name_fra=”un test CARREAU” And another one with name_fra=”un test CARRE” { codeBarre: 1, name_FRA: un test CARREAU } { codeBarre: 2, name_FRA: un test CARRE } Configuration of these fields are : field name=name_FRA type=text_general indexed=true stored=true required=false multiValued=false / field name=codeBarre type=string indexed=true stored=true required=true multiValued=false / field name=text type=text_general indexed=true stored=false multiValued=true / copyField source=name_FRA dest=text/ fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ /analyzer /fieldType When I’m using this query : http://localhost:8983/solr/cdv_product/select?q=text%3Acarre*fl=score%2C+*wt=jsonindent=truedebugQuery=true The result is : { responseHeader:{ status:0, QTime:2, params:{ debugQuery:true, fl:score, *, indent:true, q:text:carre*, wt:json}}, response:{numFound:2,start:0,maxScore:1.0,docs:[ { codeBarre:1, name_FRA:un test CARREAU, _version_:1480150860842401792, score:1.0}, { codeBarre:2, name_FRA:un test CARRE, _version_:1480150875738472448, score:1.0}] }, debug:{ rawquerystring:text:carre*, querystring:text:carre*, parsedquery:text:carre*, parsedquery_toString:text:carre*, explain:{ 1:\n1.0 = (MATCH) ConstantScore(text:carre*), product of:\n 1.0 = boost\n 1.0 = queryNorm\n, 2:\n1.0 = (MATCH) ConstantScore(text:carre*), product of:\n 1.0 = boost\n 1.0 = queryNorm\n}, QParser:LuceneQParser, timing:{ time:2.0, prepare:{ time:1.0, query:{ time:1.0}, facet:{ time:0.0}, mlt:{ time:0.0}, highlight:{ time:0.0}, stats:{ time:0.0}, expand:{ time:0.0}, debug:{ time:0.0}}, process:{ time:1.0, query:{ time:0.0}, facet:{ time:0.0}, mlt:{ time:0.0}, highlight:{ time:0.0}, stats:{ time:0.0}, expand:{ time:0.0}, debug:{ time:1.0} The score is the same for both of record. CARREAU record is first and CARRE is next. I want to place CARRE before CARREAU result because CARRE is an exact match. Is it possible? NB : scoring for this query only use querynorm and boosters In this test : http://localhost:8983/solr/cdv_product/select?q=text%3Acarrefl=score%2C*wt=jsonindent=truedebugQuery=true I have only one record found but the scoring is more complex. Why? { responseHeader:{status:0,QTime:2,params:{ debugQuery:true, fl:score,*, indent:true, q:text:carre, wt:json}}, response:{numFound:1,start:0,maxScore:0.53033006,docs:[ { codeBarre:2,name_FRA:un test CARRE, _version_:1480150875738472448,score:0.53033006}] }, debug:{ rawquerystring:text:carre,querystring:text:carre, parsedquery:text:carre,parsedquery_toString:text:carre, explain:{ 2:\n0.53033006 = (MATCH) weight(text:carre in 0) [DefaultSimilarity], result of:\n 0.53033006 = fieldWeight in 0, product of:\n 1.4142135 = tf(freq=2.0), with freq of:\n 2.0 = termFreq=2.0\n1.0 = idf(docFreq=1, maxDocs=2)\n0.375 = fieldNorm(doc=0)\n}, QParser:LuceneQParser,timing:{ time:2.0, prepare:{
Re: Help needed in Indexing and Search on xml content
Hi Sangeetha, If you can tell me a little bit more about your setup, I can try and help. If you are on skype, that would be the easiest. Thanks -Peri On Sep 25, 2014, at 3:50 AM, sangeetha.subraman...@gtnexus.com wrote: Hi Team, I am a newbie to SOLR. I have got search fields stored in a xml file which is stored in MSSQL. I want to index on the content of the xml file in SOLR. We need to provide search based on the fields present in the XML file. The reason why we are storing the input details as XML file is , the users will be able to add custom input fields on their own with values. Storing these custom fields as columns in MSSQL seems to be not an optimal solution. So we thought of putting it in XML file and store that file in RDBMS. But I am not sure on how we can index the content of the file to make search better. I believe this can be done by ExtractingRequestHandler. Could someone help me on how we can implement this/ direct me to some pages which could be of help to me ? Thanks Sangeetha --- This message has been scanned for viruses and dangerous content by HTC E-Mail Virus Protection Service. *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Changed behavior in solr 4 ??
I am not aware of any such feature! That doesn't mean it doesn't exist, but I don't recall seeing it in the Solr source code. -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Wednesday, September 24, 2014 1:31 AM To: solr-user@lucene.apache.org Subject: Re: Changed behavior in solr 4 ?? Hi Jack: Thanks for the response, yes the way you describe I know it works and is how I get it to work but then what does mean the snippet of the documentation I see on the documentation about overriding the default components shipped with Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw something similar to what I wanted to do: searchComponent name=query class=solr.QueryComponent lst name=invariants str name=rows25/str str name=dfcontent_field/str /lst lst name=defaults str name=q*:*/str str name=indenttrue/str str name=echoParamsexplicit/str /lst /searchComponent Because each default search component exists by default even if it’s not defined explicitly in the solrconfig.xml file, defining them explicitly as in the previous listing will replace the default configuration. The previous snippet is from the quoted book Solr in Action, I understand that in each SearchHandler I could define this parameters bu if defined in the searchComponent (as the book says) this configuration wouldn’t apply to all my request handlers? eliminating the need to replicate the same parameter in several parts of my solrconfig.xml (i.e all the request handlers)? Regards, On Sep 23, 2014, at 11:53 PM, Jack Krupansky j...@basetechnology.com wrote: You set the defaults on the search handler, not the search component. See solrconfig.xml: requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str /lst ... -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Tuesday, September 23, 2014 11:02 AM To: solr-user@lucene.apache.org Subject: Changed behavior in solr 4 ?? Hi: I’m trying to change the default configuration for the query component of a SearchHandler, basically I want to set a default value to the rows parameters and that this value be shared by all my SearchHandlers, as stated on the solrconfig.xml comments, this could be accomplished redeclaring the query search component, however this is not working on solr 4.9.0 which is the version I’m using, this is my configuration: searchComponent name=query class=solr.QueryComponent lst name=defaults int name=rows1/int /lst /searchComponent The relevant portion of the solrconfig.xml comment is: If you register a searchComponent to one of the standard names, will be used instead of the default.” so is this a new desired behavior?? although just for testing a redefined the components of the request handler to only use the query component and not to use all the default components, this is how it looks: requestHandler name=/select class=solr.SearchHandler” arr name=components strquery/str /arr /requestHandler Everything works ok but the the rows parameter is not used, although I’m not specifying the rows parameter on the URL. Regards,Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
point buffer returned as an elipse, how to configure?
Solr team, I am indexing geographic points in dec degrees lat lon using the location_rpt type in my index. The type is setup like this fieldType name=location_rpt class=solr.SpatialRecursivePrefixTreeFieldType geo=true distErrPct=0.025 maxDistErr=0.09 units=degrees / my field definition is this field name=pointGeom_rpt type=location_rpt indexed=true stored=true multiValued=false/ my problem is the return is a very narrow but tall ellipse likely due to the degrees and geo true... but when I change those params to geo=false...the index won't start this is the query I am using String query = http://myserver:8983/solr/mycore/select?q=*:*fq={!geofilt}sfield=pointGeom_rptpt=; + lat + , + lon + d= + distance + wt=jsonindent=truegeo=truerows= + rows; I am not using solr cloud, and I am on version 4.8.0 I also opened up this stackoverflow question... it has some more details and a picture of the return I get http://stackoverflow.com/questions/25996820/why-is-solr-spatial-buffer-returned-as-an-elipse BTW, I'm an OpenNLP committer and I am very geospatially focused, let me know if you want help with anything geo, I'll try to carve out some time if needed. thanks G$
Solr stops in between indexing
Hi, I have solr configured on google cloud server. When ever i try to index it ,it stops in between and shows and error connection lost,connection timeout. I have 2200 records some time it stops full indexing at 917 sometime 1385 sometime 2185. I have apache2 running on google cloud on debian OS. Earlier it was working fine,it has started giving this error recently only Please advise and help. -- Regards Madhav Bahuguna
Re: Help in selecting the appropriate feature to obtain results
I call it 'reverse search' problem (regex indexing). It's almost impossible. You can - do it your own http://blog.mikemccandless.com/2013/06/build-your-own-finite-state-transducer.html - create http://lucene.apache.org/core/4_1_0/memory/org/apache/lucene/index/memory/MemoryIndex.html from the incoming string, and search by those stored queries with regexps. eg check https://www.youtube.com/watch?v=rmRCsrJp2A8 - more realistically you can index separate letters from patterns, search for any of incoming letters, and postfilter the result, which are found. On Wed, Sep 24, 2014 at 7:04 PM, barrybear rotibo...@gmail.com wrote: Hi guys, I'm still a beginner to Solr and I'm not sure whether to implement a custom Filter Query or any other available features/plugins that I am not aware of in Solr. I am using Solr v4.4.0. I have a collection as an example as below: [ { description: 'group1', group: ['G?', 'GE*'] }, { description: 'group2', group: ['GEB'] }, { description: 'group3', group: ['G'] } ] Where group field is a multiValued whereby will contain of alphabets which will determine the ranking and two special characters: ? and *. Placing a ? at the back will mean any subordinate of that ranking, while * means all levels of subordinates of that particular ranking. If I were to search for group:'GEB', I will expect to obtain result: [ { description: 'group1', group: ['G?', 'GE*'] }, { description: 'group2', group: ['GEB'] } ] While searching for group:'GE', should return this result: [ { description: 'group1', group: ['G?', 'GE*'] } ] And finally searching for group:'G' should only return one result: [ { description: 'group3', group: ['G'] } ] Hope that my explanation is clear enough and thanks for your attention and time.. -- View this message in context: http://lucene.472066.n3.nabble.com/Help-in-selecting-the-appropriate-feature-to-obtain-results-tp4160944.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Setting of Default Boost in Edismax Search Handler
I have a setup very similar to the /browse handler in the example (http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/example-DIH/solr/db/conf/solrconfig.xml?view=markup) I am curious if it is possible to set a default boost function (e.g. bf=log(qty)) , so that all query results would reflect it. Thank you, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-of-Default-Boost-in-Edismax-Search-Handler-tp4161122.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MRIT's morphline mapper doesn't co-locate with data
Do you have the solr Jira number for the new ingestion tool? Thanks On Wed, Sep 24, 2014 at 7:57 PM, Wolfgang Hoschek whosc...@cloudera.com wrote: Based on our measurements, Lucene indexing is so CPU intensive that it wouldn’t really help much to exploit data locality on read. The overwhelming bottleneck remains the same. Having said that, we have an ingestion tool in the works that will take advantage of data locality for splitable files as well. Wolfgang. On Sep 24, 2014, at 9:38 AM, Tom Chen tomchen1...@gmail.com wrote: Hi, The MRIT (MapReduceIndexerTool) uses NLineInputFormat for the morphline mapper. The mapper doesn't co-locate with the input data that it process. Isn't this a performance hit? Ideally, morphline mapper should be run on those hosts that contain most data blocks for the input files it process. Regards, Tom
Solr and hadoop
I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat and EsOutputFormat that are provided by Elasticserach for Hadoop (es-hadoop). Is it possible for Solr to provide such integration with Hadoop? Best, Tom
Re: Changed behavior in solr 4 ??
I haven’t used it before this, basically I found out about this in the Solr in Action book and guided by the comment about redefining the default components by defining a new searchComponent with the same name. Any how thanks for your reply! Regards, On Sep 25, 2014, at 8:01 AM, Jack Krupansky j...@basetechnology.com wrote: I am not aware of any such feature! That doesn't mean it doesn't exist, but I don't recall seeing it in the Solr source code. -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Wednesday, September 24, 2014 1:31 AM To: solr-user@lucene.apache.org Subject: Re: Changed behavior in solr 4 ?? Hi Jack: Thanks for the response, yes the way you describe I know it works and is how I get it to work but then what does mean the snippet of the documentation I see on the documentation about overriding the default components shipped with Solr? Even on the book Solr in Action in chapter 7 listing 7.3 I saw something similar to what I wanted to do: searchComponent name=query class=solr.QueryComponent lst name=invariants str name=rows25/str str name=dfcontent_field/str /lst lst name=defaults str name=q*:*/str str name=indenttrue/str str name=echoParamsexplicit/str /lst /searchComponent Because each default search component exists by default even if it’s not defined explicitly in the solrconfig.xml file, defining them explicitly as in the previous listing will replace the default configuration. The previous snippet is from the quoted book Solr in Action, I understand that in each SearchHandler I could define this parameters bu if defined in the searchComponent (as the book says) this configuration wouldn’t apply to all my request handlers? eliminating the need to replicate the same parameter in several parts of my solrconfig.xml (i.e all the request handlers)? Regards, On Sep 23, 2014, at 11:53 PM, Jack Krupansky j...@basetechnology.com wrote: You set the defaults on the search handler, not the search component. See solrconfig.xml: requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str /lst ... -- Jack Krupansky -Original Message- From: Jorge Luis Betancourt Gonzalez Sent: Tuesday, September 23, 2014 11:02 AM To: solr-user@lucene.apache.org Subject: Changed behavior in solr 4 ?? Hi: I’m trying to change the default configuration for the query component of a SearchHandler, basically I want to set a default value to the rows parameters and that this value be shared by all my SearchHandlers, as stated on the solrconfig.xml comments, this could be accomplished redeclaring the query search component, however this is not working on solr 4.9.0 which is the version I’m using, this is my configuration: searchComponent name=query class=solr.QueryComponent lst name=defaults int name=rows1/int /lst /searchComponent The relevant portion of the solrconfig.xml comment is: If you register a searchComponent to one of the standard names, will be used instead of the default.” so is this a new desired behavior?? although just for testing a redefined the components of the request handler to only use the query component and not to use all the default components, this is how it looks: requestHandler name=/select class=solr.SearchHandler” arr name=components strquery/str /arr /requestHandler Everything works ok but the the rows parameter is not used, although I’m not specifying the rows parameter on the URL. Regards,Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com Concurso Mi selfie por los 5. Detalles en http://justiciaparaloscinco.wordpress.com
Re: Solr and hadoop
Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the Morphline stuff (check out https://github.com/markrmiller/solr-map-reduce-example). Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen tomchen1...@gmail.com wrote: I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat and EsOutputFormat that are provided by Elasticserach for Hadoop (es-hadoop). Is it possible for Solr to provide such integration with Hadoop? Best, Tom
Re: Solr Cloud Default Document Routing
Well, you've picked the absolute worst case for comparison. The increase to double digits is a constant overhead. IOW, let's say your query went from 5ms to 20 ms. That 15 ms is pretty much the additional overhead no matter what the query. This particular query just happens to be very fast in the first place. As far as queries going out to all the shards.. Well, they have to. The query processing cannot know ahead of time (except in this _very_ special case) what shards will generate hits. So the request is sent out to one replica in each shard, which responds with its top N. The originating node then combines the sub-queries to get the IDs of the final top N, then sends a request out to each shard hosting one of those top N for the data associated with the document. If you really need super-efficiency here, you could probably look at SolrCloudServer to get an idea of how to translate from ID to shard and just do direct requests with distrib=false. Best, Erick On Wed, Sep 24, 2014 at 5:44 PM, Susmit Shukla shukla.sus...@gmail.com wrote: Hi, I'm building out a multi shard solr collection as the index size is likely to grow fast. I was testing out the setup with 2 shards on 2 nodes with test data. Indexed few documents with id as the unique key. collection create command - /solr/admin/collections?action=CREATEname=multishardnumShards=2 used this command to upload - curl http://server/solr/multishard/update/json?commitWithin=2000 --data-binary @data.json -H 'Content-type:application/json' data.json - [ { id: 100161200 } { id: 100161384 } ] when I query on one of the node with with an id constraint, I see the query executed on both shards which looks inefficient - Qtime increased to double digits. I guess solr would know based on id which shard data went to. I have a few questions around this as I could not find pertinent information on user lists or documentation. - query is hitting all shards and replicas - if I have 3 shards and 5 replicas , how would the performance be impacted since for the very simple case it increased to double digits? - Could id lookup queries just go to one shard automatically? /solr/multishard/select?q=id%3A100161200wt=jsonindent=truedebugQuery=true QTime:13, debug:{ track:{ rid:-multishard_shard1_replica1-1411605234897-171, EXECUTE_QUERY:[ http://server1/solr/multishard_shard1_replica1/;,[ QTime,1, ElapsedTime,4, RequestPurpose,GET_TOP_IDS, NumFound,1, Response,some resp], http://server2/solr/multishard_shard2_replica1/;,[ QTime,1, ElapsedTime,6, RequestPurpose,GET_TOP_IDS, NumFound,0, Response,some]], GET_FIELDS:[ http://server1/solr/multishard_shard1_replica1/;,[ QTime,0, ElapsedTime,4, RequestPurpose,GET_FIELDS,GET_DEBUG, NumFound,1, Thanks, Susmit
Re: SolrCloud Slow to boot up
1. What version of Solr are you running? 2. Have you made substantial changes to solrconfig.xml? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Sep 25, 2014 at 7:19 AM, anand.mahajan an...@zerebral.co.in wrote: Hello all, Hosted a SolrCloud - 6 Nodes - 36 Shards x 3 Replica each - 108 cores across 6 servers. Moved in about 250M documents in this cluster. When I restart this cluster - only the leaders per shard comes up live instantly (within a minute) and all the replicas are shown as Recovering on the Cloud screen and all 6 servers are doing some processing (consuming about 4 CPUs at the back and doing a lot of Network IO too) In essence its not doing any reads are writes to the index and I dont see any replication/catch up activity going on too at the back, yet the RAM grows consuming all 96GB available on each box. And all the Recovering replicas recover one by one in about an hour or so. Why is it taking so long to boot up, and what is it doing that is consuming so much CPU, RAM and Network IO? All disks are reading at 100% on all servers during this boot up. Is there are setting I might have missed that will help? FYI - The Zookeeper cluster is on the same 6 boxes. Size of the Solr data dir is about 150GB per server and each box has 96GB RAM. Thanks, Anand -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Slow to boot up
1. I've hosted it with Helios v 0.07 that ships with Solr 4.10 2. Change to solrconfig.xml - a. commits every 10 mins b. soft commits every 10 secs c. disabled all caches as the usage is very random (no end users only services doing the searches) and mostly single requests d. use cold searcher = true -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098p4161132.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help needed in Indexing and Search on xml content
Hi, You can retrieve the data in xml format aswell in JSON. You need to learn about schema.xml, in this you define your fields which is present in your xml, on which fields you want to search,etc So it would be better to take a look at schema.xml, solr sample schema could clear.most of doubts. On Sep 25, 2014 5:12 PM, PeriS peri.subrahma...@htcinc.com wrote: Hi Sangeetha, If you can tell me a little bit more about your setup, I can try and help. If you are on skype, that would be the easiest. Thanks -Peri On Sep 25, 2014, at 3:50 AM, sangeetha.subraman...@gtnexus.com wrote: Hi Team, I am a newbie to SOLR. I have got search fields stored in a xml file which is stored in MSSQL. I want to index on the content of the xml file in SOLR. We need to provide search based on the fields present in the XML file. The reason why we are storing the input details as XML file is , the users will be able to add custom input fields on their own with values. Storing these custom fields as columns in MSSQL seems to be not an optimal solution. So we thought of putting it in XML file and store that file in RDBMS. But I am not sure on how we can index the content of the file to make search better. I believe this can be done by ExtractingRequestHandler. Could someone help me on how we can implement this/ direct me to some pages which could be of help to me ? Thanks Sangeetha --- This message has been scanned for viruses and dangerous content by HTC E-Mail Virus Protection Service. *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: /suggest through SolrJ?
You can call anything from SolrJ that you can call from a URL. SolrJ has lots of convenience stuff to set particular parameters, parse the response, etc... But in the end it's communicating with Solr via a URL. Take a look at something like SolrQuery for instance. It has a nice command setFacetPrefix. Here's the entire method: public SolrQuery setFacetPrefix( String field, String prefix ) { this.set( FacetParams.FACET_PREFIX, prefix ); return this; } which is really this.set( facet.prefix, prefix ); All it's really doing is setting a SolrParams key/value pair which is equivalent to facet.prefix=blahblah on a URL. As I remember, there's a setPath method that you can use to set the destination for the request to suggest (or maybe /suggest). It's something like that. Best, Erick On Thu, Sep 25, 2014 at 3:47 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Am I right that I cannot call /suggest (i.e. the corresponding RequestHandler) through SolrJ? What is the preferreded way to call Solr handlers/operations not supported by SolrJ from Java? Through new SolrJ Request-classes?
Re: Turn off suggester
Well, tell us more about the suggester configuration, the number of unique terms in the field you're using, what version of Solr, etc. As Hoss says, details matter. Best, Erick On Thu, Sep 25, 2014 at 4:18 AM, PeriS peri.subrahma...@htcinc.com wrote: Is there a way to turn off the solr suggester? I have about 30M records and when tomcat starts up, it takes a long time (~10 minutes) for the suggester to decompress the data or its doing soothing as it hangs on SolrSuggester.build(); Any ideas please? Thanks -Peri *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Solr stops in between indexing
If it was working fine and suddenly stopped, I have to ask what was the last thing that changed? Frankly it sounds like your network has started having some problems. Best, Erick On Thu, Sep 25, 2014 at 6:29 AM, madhav bahuguna madhav.bahug...@gmail.com wrote: Hi, I have solr configured on google cloud server. When ever i try to index it ,it stops in between and shows and error connection lost,connection timeout. I have 2200 records some time it stops full indexing at 917 sometime 1385 sometime 2185. I have apache2 running on google cloud on debian OS. Earlier it was working fine,it has started giving this error recently only Please advise and help. -- Regards Madhav Bahuguna
Re: /suggest through SolrJ?
On 9/25/2014 8:43 AM, Erick Erickson wrote: You can call anything from SolrJ that you can call from a URL. SolrJ has lots of convenience stuff to set particular parameters, parse the response, etc... But in the end it's communicating with Solr via a URL. Take a look at something like SolrQuery for instance. It has a nice command setFacetPrefix. Here's the entire method: public SolrQuery setFacetPrefix( String field, String prefix ) { this.set( FacetParams.FACET_PREFIX, prefix ); return this; } which is really this.set( facet.prefix, prefix ); All it's really doing is setting a SolrParams key/value pair which is equivalent to facet.prefix=blahblah on a URL. As I remember, there's a setPath method that you can use to set the destination for the request to suggest (or maybe /suggest). It's something like that. Yes, like Erick says, just use SolrQuery for most accesses to Solr on arbitrary URL paths with arbitrary URL parameters. The set method is how you include those parameters. The SolrQuery method Erick was talking about at the end of his email is setRequestHandler(String), and you would set that to /suggest. Full disclosure about what this method actually does: it also sets the qt parameter, but with the modern example Solr config, the qt parameter doesn't do anything -- you must actually change the URL path on the request, which this method will do if the value starts with a forward slash. Thanks, Shawn
Re: Solr and hadoop
I'm aware of the MapReduceIndexerTool (MRIT). That might be solving the indexing part -- the OutputFormat part. But what I asked for is more on the making Solr index data available to Hadoop MapReduce -- making Solr as a data store like what HDFS can provide. With a Solr InputFormat, we can make the Solr index data available to Hadoop MapReduce. Along the same line, we can also make Solr index data available to Hive, Spark and etc like what es-hadoop can do. Best, Tom On Thu, Sep 25, 2014 at 10:26 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the Morphline stuff (check out https://github.com/markrmiller/solr-map-reduce-example). Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen tomchen1...@gmail.com wrote: I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat and EsOutputFormat that are provided by Elasticserach for Hadoop (es-hadoop). Is it possible for Solr to provide such integration with Hadoop? Best, Tom
Solr mapred MTree merge stage ~6x slower in 4.10
As an update to this thread, it seems my MTree wasn't completely hanging, it was just much slower in 4.10. If I replace 4.9.0 with 4.10 in my jar the MTree merge stage is 6x (or more) slower (in my case, 20 min becomes 2 hours). I hope to bisect this in the future, but the jobs I'm running take a long time. I haven't tried to see if the issue shows on smaller jobs yet (does 1 minute become 6 minutes?). Brett On Tue, Sep 16, 2014 at 12:54 PM, Brett Hoerner br...@bretthoerner.com wrote: I have a very weird problem that I'm going to try to describe here to see if anyone has any ah-ha moments or clues. I haven't created a small reproducible project for this but I guess I will have to try in the future if I can't figure it out. (Or I'll need to bisect by running long Hadoop jobs...) So, the facts: * Have been successfully using Solr mapred to build very large Solr clusters for months * As of Solr 4.10 *some* job sizes repeatably hang in the MTree merge phase in 4.10 * Those same jobs (same input, output, and Hadoop cluster itself) succeed if I only change my Solr deps to 4.9 * The job *does succeed* in 4.10 if I use the same data to create more, but smaller shards (e.g. 12x as many shards each 1/12th the size of the job that fails) * Creating my normal size shards (the size I want, that works in 4.9) the job hangs with 2 mappers running, 0 reducers in the MTree merge phase * There are no errors or warning in the syslog/stderr of the MTree mappers, no errors ever echo'd back to the interactive run of the job (mapper says 100%, reduce says 0%, will stay forever) * No CPU being used on the boxes running the merge, no GC happening, JVM waiting on a futex, all threads blocked on various queues * No disk usage problems, nothing else obviously wrong with any box in the cluster I diff'ed around between 4.10 and 4.9 and barely see any changes in mapred contrib, mostly some test stuff. I didn't see any transitive dependency changes in Solr/Lucene that look like they would affect me.
Re: Solr and hadoop
Hi Tom, I am not aware of a Solr InputFormat implementation yet. The /export handier, which outputs entire sorted results sets, was designed to support these types of bulk export operations efficiently. I think a Solr InputFormat would be excellent project to begin working on. Also SOLR-6526 is underway to provide SolrCloud with native streaming aggregation capabilities. Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 25, 2014 at 12:34 PM, Tom Chen tomchen1...@gmail.com wrote: I'm aware of the MapReduceIndexerTool (MRIT). That might be solving the indexing part -- the OutputFormat part. But what I asked for is more on the making Solr index data available to Hadoop MapReduce -- making Solr as a data store like what HDFS can provide. With a Solr InputFormat, we can make the Solr index data available to Hadoop MapReduce. Along the same line, we can also make Solr index data available to Hive, Spark and etc like what es-hadoop can do. Best, Tom On Thu, Sep 25, 2014 at 10:26 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Yes, there's SolrInputDocumentWritable and MapReduceIndexerTool, plus the Morphline stuff (check out https://github.com/markrmiller/solr-map-reduce-example). Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Thu, Sep 25, 2014 at 9:58 AM, Tom Chen tomchen1...@gmail.com wrote: I wonder if Solr has InputFormat and OutputFormat like the EsInputFormat and EsOutputFormat that are provided by Elasticserach for Hadoop (es-hadoop). Is it possible for Solr to provide such integration with Hadoop? Best, Tom
Why does the q parameter change?
Good afternoon all, I just implemented a phrase search and the parsed query gets changed from rapid prototyping to rapid prototype. I used the solr analyzer and prototyping was unchanged so I think I ruled out a tokenizer. So can anyone tell me what's going on? Here's the query: q=rapid prototypingdefType=edismaxqf=textpf2=text^40ps=0 here's the debugger: as you can see; prototyping gets changed to just prototype. What's causing this and how do I turn it off? Thanks, lst name=debug lst name=queryBoosting str name=qrapid prototyping/str null name=match//lst str name=rawquerystringrapid prototyping/strstr name=querystringrapid prototyping/str str name=parsedquery(+((DisjunctionMaxQuery((text:rapid)) DisjunctionMaxQuery((text:prototype)))~2) DisjunctionMaxQuery((text:rapid prototype^40.0)))/no_coord/str str name=parsedquery_toString+(((text:rapid) (text:prototype))~2) (text:rapid prototype^40.0)/str str name=QParserExtendedDismaxQParser/str -- View this message in context: http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why does the q parameter change?
Ok, I think I'm on to something. I omitted this parameter which means it is set to false by default on my text field. I need to set it to true and see what happens... autoGeneratePhraseQueries=true If I'm reading the wiki right, this parameter if true will preserve phrase queries... -- View this message in context: http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161185.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Turn off suggester
Isn't it one of the Solr components? Can it be just removed from the default chain? Random poking in the dark here. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 25 September 2014 10:45, Erick Erickson erickerick...@gmail.com wrote: Well, tell us more about the suggester configuration, the number of unique terms in the field you're using, what version of Solr, etc. As Hoss says, details matter. Best, Erick On Thu, Sep 25, 2014 at 4:18 AM, PeriS peri.subrahma...@htcinc.com wrote: Is there a way to turn off the solr suggester? I have about 30M records and when tomcat starts up, it takes a long time (~10 minutes) for the suggester to decompress the data or its doing soothing as it hangs on SolrSuggester.build(); Any ideas please? Thanks -Peri *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Turn off suggester
The SuggestComponent is not in the default components list. There must be a request handler with this component added explicitly in the solrconfig.xml Tomás On Thu, Sep 25, 2014 at 12:22 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Isn't it one of the Solr components? Can it be just removed from the default chain? Random poking in the dark here. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 25 September 2014 10:45, Erick Erickson erickerick...@gmail.com wrote: Well, tell us more about the suggester configuration, the number of unique terms in the field you're using, what version of Solr, etc. As Hoss says, details matter. Best, Erick On Thu, Sep 25, 2014 at 4:18 AM, PeriS peri.subrahma...@htcinc.com wrote: Is there a way to turn off the solr suggester? I have about 30M records and when tomcat starts up, it takes a long time (~10 minutes) for the suggester to decompress the data or its doing soothing as it hangs on SolrSuggester.build(); Any ideas please? Thanks -Peri *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Help needed in Indexing and Search on xml content
Have a look at data import handler and you'll need to use nested entities. That should get you at least a demo. Then you can decide whether that's good enough. Regards, Alex On 25/09/2014 3:51 am, sangeetha.subraman...@gtnexus.com sangeetha.subraman...@gtnexus.com wrote: Hi Team, I am a newbie to SOLR. I have got search fields stored in a xml file which is stored in MSSQL. I want to index on the content of the xml file in SOLR. We need to provide search based on the fields present in the XML file. The reason why we are storing the input details as XML file is , the users will be able to add custom input fields on their own with values. Storing these custom fields as columns in MSSQL seems to be not an optimal solution. So we thought of putting it in XML file and store that file in RDBMS. But I am not sure on how we can index the content of the file to make search better. I believe this can be done by ExtractingRequestHandler. Could someone help me on how we can implement this/ direct me to some pages which could be of help to me ? Thanks Sangeetha
Re: Why does the q parameter change?
No, apparently it's the KStemFilter. should I turn this off at query time? I'll put this in another question... -- View this message in context: http://lucene.472066.n3.nabble.com/Why-does-the-q-parameter-change-tp4161179p4161199.html Sent from the Solr - User mailing list archive at Nabble.com.
Best practice for KStemFilter query or index or both?
Good afternoon, Here's my configuration for a text field. I have the same configuration for index and query time. Is this valid? What's the best practice for these query or index or both? for synonyms; I've read conflicting reports on when to use it but I'm currently changing it over to at indexing time only. Thanks, fieldType name=text_general class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer analyzer type=select tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Best practice for KStemFilter query or index or both?
Hi - most filters should be used both sides, especially stemmers, accent foldings and obviously lowercasing. Synonyms only on one side, depending on how you want to utilize them. Markus -Original message- From:eShard zim...@yahoo.com Sent: Thursday 25th September 2014 22:23 To: solr-user@lucene.apache.org Subject: Best practice for KStemFilter query or index or both? Good afternoon, Here's my configuration for a text field. I have the same configuration for index and query time. Is this valid? What's the best practice for these query or index or both? for synonyms; I've read conflicting reports on when to use it but I'm currently changing it over to at indexing time only. Thanks, fieldType name=text_general class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer analyzer type=select tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 preserveOriginal=1 / filter class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KStemFilterFactory / /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-KStemFilter-query-or-index-or-both-tp4161201.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term
The difference comes in the fact that when you query the same form it matches 2 tokens including the less common one. When you query a different form you only match on the more common form. So really you're getting the boost from both the tiny difference in TF*IDF and the extra token that you match on. However, I agree that adding a payload might be a better solution. - Original Message - Hi - but this makes no sense, they are scored as equals, except for tiny differences in TF and IDF. What you would need is something like a stemmer that preserves the original token and gives a 1 payload to the stemmed token. The same goes for filters like decompounders and accent folders that change meaning of words. -Original message- From:Diego Fernandez difer...@redhat.com Sent: Wednesday 17th September 2014 23:37 To: solr-user@lucene.apache.org Subject: Re: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term I'm not 100% on this, but I imagine this is what happens: (using - to mean tokenized to) Suppose that you index: I am running home - am run running home If you then query running home - run running home and thus give a higher score than if you query runs home - run runs home - Original Message - The Solr wiki says A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming (Full section reproduced below.) I can see how in the example from the wiki reproduced below that both the stemmed and original term get indexed, but I don't see how the original term gets more weight than the stemmed term. Wouldn't this require a filter that gives terms with the keyword attribute more weight? What am I missing? Tom - A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. This filter emits two tokens for each input token, one of them is marked with the Keyword attribute. Stemmers that respect keyword attributes will pass through the token so marked without change. So the effect of this filter would be to index both the original word and the stemmed version. The 4 stemmers listed above all respect the keyword attribute. For terms that are not changed by stemming, this will result in duplicate, identical tokens in the document. This can be alleviated by adding the RemoveDuplicatesTokenFilterFactory. fieldType name=text_keyword class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.PorterStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- Diego Fernandez - 爱国 Software Engineer GSS - Diagnostics -- Diego Fernandez - 爱国 Software Engineer GSS - Diagnostics IRC: aiguofer on #gss and #customer-platform
Re: point buffer returned as an elipse, how to configure?
Hi Mark, I asked a follow-up question/observation to your Stackoverflow instantiation of your question. I also wrote the following, which doesn’t yet fit into an answer because I don’t know what problem you are yet experiencing: Some technical details: geo=true|false is an attribute on the field type; it isn't a request parameter. Should you want to change it to geo=false, you will also have to set the worldBounds and certainly re-index. Almost any change to a field type in the schema requires a re-index. If your units stay degrees then you can continue to use lat,lon format, but if you use another unit specific to some projection then it's not degrees and I suggest switching to x y to avoid confusion with lat,lon format. FYI units=degrees is required but it has no effect. When geo=true, the 'd' in geofilt is kilometers, when geo=false, 'd' is in the units of the numbers you put into the index. The docs are here: https://cwiki.apache.org/confluence/display/solr/Spatial+Search It would be awesome if you want to help further spatial in Lucene/Solr. This year is looking like a great year for spatial — I’m particularly excited about a new “FlexPrefixTree” from Varun (GSOC 2014) together with the latest advances in auto-prefixing to be released in Lucene/Solr 5.0. ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Thu, Sep 25, 2014 at 8:42 AM, Mark G ma...@apache.org wrote: Solr team, I am indexing geographic points in dec degrees lat lon using the location_rpt type in my index. The type is setup like this fieldType name=location_rpt class=solr.SpatialRecursivePrefixTreeFieldType geo=true distErrPct=0.025 maxDistErr=0.09 units=degrees / my field definition is this field name=pointGeom_rpt type=location_rpt indexed=true stored=true multiValued=false/ my problem is the return is a very narrow but tall ellipse likely due to the degrees and geo true... but when I change those params to geo=false...the index won't start this is the query I am using String query = http://myserver:8983/solr/mycore/select?q=*:*fq={!geofilt}sfield=pointGeom_rptpt= + lat + , + lon + d= + distance + wt=jsonindent=truegeo=truerows= + rows; I am not using solr cloud, and I am on version 4.8.0 I also opened up this stackoverflow question... it has some more details and a picture of the return I get http://stackoverflow.com/questions/25996820/why-is-solr-spatial-buffer-returned-as-an-elipse BTW, I'm an OpenNLP committer and I am very geospatially focused, let me know if you want help with anything geo, I'll try to carve out some time if needed. thanks G$
AW: /suggest through SolrJ?
Thx to you two. Just in case anybody else is trying to do this. The following SolrJ code corresponds to the http request GET http://localhost:8983/solr/solrpedia/suggest?q=atmo of Solr in Action (chapter 10): ... SolrServer server = new HttpSolrServer(http://localhost:8983/solr/solrpedia;); SolrQuery query = new SolrQuery( atmo ); query.setRequestHandler( /suggest ); QueryResponse queryresponse = server.query( query ); ... queryresponse.getSpellCheckResponse().getSuggestions(); ... -Ursprüngliche Nachricht- Von: Shawn Heisey [mailto:s...@elyograg.org] Gesendet: Donnerstag, 25. September 2014 17:37 An: solr-user@lucene.apache.org Betreff: Re: /suggest through SolrJ? On 9/25/2014 8:43 AM, Erick Erickson wrote: You can call anything from SolrJ that you can call from a URL. SolrJ has lots of convenience stuff to set particular parameters, parse the response, etc... But in the end it's communicating with Solr via a URL. Take a look at something like SolrQuery for instance. It has a nice command setFacetPrefix. Here's the entire method: public SolrQuery setFacetPrefix( String field, String prefix ) { this.set( FacetParams.FACET_PREFIX, prefix ); return this; } which is really this.set( facet.prefix, prefix ); All it's really doing is setting a SolrParams key/value pair which is equivalent to facet.prefix=blahblah on a URL. As I remember, there's a setPath method that you can use to set the destination for the request to suggest (or maybe /suggest). It's something like that. Yes, like Erick says, just use SolrQuery for most accesses to Solr on arbitrary URL paths with arbitrary URL parameters. The set method is how you include those parameters. The SolrQuery method Erick was talking about at the end of his email is setRequestHandler(String), and you would set that to /suggest. Full disclosure about what this method actually does: it also sets the qt parameter, but with the modern example Solr config, the qt parameter doesn't do anything -- you must actually change the URL path on the request, which this method will do if the value starts with a forward slash. Thanks, Shawn